32 Measuring Divergence of Investor Opinion

A foundational question in financial economics concerns how differences in investor beliefs affect asset prices and trading activity. In markets where investors hold heterogeneous expectations about a firm’s future cash flows, the aggregation of these divergent views into a single market price becomes a non-trivial exercise with profound implications for asset valuation, return predictability, and market efficiency. The concept of divergence of investor opinion (hereafter DIVOP) has emerged as a central construct in both the accounting and finance literatures, serving as a lens through which researchers examine the information environment of firms, the dynamics of uncertainty resolution, and the nature of market reactions to news.

The theoretical foundations of the DIVOP literature trace back to Miller (1977), who proposed that when investors disagree about the value of a security and short-sale constraints prevent pessimistic investors from fully expressing their views, the market price will reflect the valuation of the most optimistic investors. This leads to systematic overpricing that is increasing in the degree of opinion divergence. The overpricing persists until information events, such as earnings announcements, reduce disagreement and prices converge toward fundamental values (Berkman et al. 2009). Varian (1985) offers an alternative perspective in which divergence of opinion represents an additional risk factor, leading to higher rather than lower expected returns, creating a theoretical tension that has motivated extensive empirical investigation.

The empirical literature on DIVOP has expanded considerably since these seminal contributions. Researchers have documented that divergence of opinion helps explain a range of asset pricing anomalies, including post-earnings announcement drift (Garfinkel and Sokobin 2006; K. L. Anderson, Harris, and So 2007), the cross-sectional return difference between value and growth stocks (Doukas, Kim, and Pantzalis 2004), short- and long-run post-IPO returns (Houge et al. 2001), pre- and post-acquisition stock returns (Alexandridis, Antoniou, and Petmezas 2007), takeover premia (Chatterjee, John, and Yan 2012), and the broad cross-section of stock returns (Diether, Malloy, and Scherbina 2002; Doukas, Kim, and Pantzalis 2006). The explanatory power of DIVOP has been demonstrated using a rich set of empirical proxies, ranging from analyst forecast dispersion and abnormal trading volume to bid-ask spreads and idiosyncratic volatility.

Despite the maturity of the DIVOP literature in developed markets, particularly the United States, its application to emerging markets remains remarkably thin. This gap is especially notable given that the theoretical conditions under which divergence of opinion matters most (namely, binding short-sale constraints, information asymmetry, and heterogeneous investor sophistication) are arguably more prevalent in emerging markets than in their developed counterparts. The Vietnamese equity market presents a compelling laboratory for studying investor disagreement. The market is characterized by several features that amplify the relevance of the DIVOP framework:

Binding short-sale constraints. Short selling was not permitted in Vietnam until January 2025, and even after its introduction, the mechanism remains restricted to a limited set of securities with significant regulatory constraints on execution. This closely mirrors the theoretical setting of Miller (1977), where pessimistic investors are unable to fully express their views through short positions.
Dominance of retail investors. Individual investors account for approximately 80-85% of daily trading volume on HOSE and HNX, compared to roughly 25% in the United States. Retail investors are more susceptible to behavioral biases, sentiment-driven trading, and information processing limitations that naturally give rise to heterogeneous beliefs (Phan et al. 2023).
Information asymmetry and transparency challenges. Despite improvements in disclosure standards, Vietnam’s regulatory framework for corporate reporting remains less stringent than those in developed markets. Selective disclosure, delayed filing of financial statements, and limited enforcement of insider trading regulations create an environment in which investors operate with substantially different information sets (Vo and Phan 2017).
Foreign ownership limits. Caps on foreign ownership (currently 49% for most sectors, with exceptions) create a segmented market where domestic and foreign investors may hold systematically different views about firm value, amplifying the divergence of opinion.
Thin analyst coverage. Whereas a typical S&P 500 firm is followed by 15-25 sell-side analysts, coverage of Vietnamese equities is concentrated among a relatively small number of domestic brokerages and a handful of international research houses. This limits the informativeness of traditional analyst-based DIVOP measures and necessitates greater reliance on market-based proxies.

This chapter provides a methodology for constructing multiple proxies for divergence of investor opinion adapted to the institutional characteristics of the Vietnamese market. We draw on the methodological frameworks established by Garfinkel (2009) and Diether, Malloy, and Scherbina (2002), while introducing modifications that account for the microstructure of Vietnamese exchanges, the $T+2$ settlement cycle, the absence (until recently) of short selling, and the availability of data through domestic financial platforms. Specifically, we construct and analyze the following DIVOP proxies:

Unexplained Volume (DTO): Market-adjusted turnover detrended by its rolling median, capturing abnormal trading activity attributable to disagreement after controlling for liquidity and market-wide effects.
Standardized Unexplained Volume (SUV): A regression-based measure that explicitly controls for the informedness and liquidity components of volume by modeling turnover as a function of signed returns.
Stock Return Volatility (VOLATILITY): The standard deviation of daily returns over a rolling estimation window, serving as a proxy for the dispersion of investor valuations.
Bid-Ask Spread (BASPREAD): The proportional quoted spread, reflecting the adverse selection component associated with heterogeneous information among market participants.
Analyst Forecast Dispersion (DISP): The cross-sectional standard deviation of individual analyst earnings forecasts, directly measuring disagreement among informed market participants.
Idiosyncratic Volatility (IVOL): The residual volatility from a factor model regression, isolating the firm-specific component of return variation that reflects divergent investor interpretations of firm-level information.
Amihud Illiquidity (ILLIQ): The price impact ratio proposed by Amihud (2002), which captures the information asymmetry dimension of disagreement through the price response to order flow.

For each proxy, we describe the theoretical motivation, the data requirements, the construction methodology adapted for Vietnamese data, the empirical properties observed in the Vietnamese cross-section, and the practical considerations that researchers should bear in mind when employing these measures. We pay particular attention to issues that are specific to emerging markets, including thin trading, corporate action adjustments, exchange-specific microstructure effects, and the interplay between foreign ownership constraints and measures of investor disagreement.

33 Theoretical Framework

33.1 The Miller (1977) Overpricing Hypothesis

The canonical model of divergence of opinion and asset pricing begins with Miller (1977). Miller’s central insight is simple: in a market where investors hold heterogeneous beliefs about the future payoffs of a risky asset and short-sale constraints prevent some investors from acting on their pessimistic views, the equilibrium price will be set by the subset of investors who are most optimistic about the asset’s value. The severity of overpricing is increasing in both the degree of opinion divergence and the stringency of short-sale constraints. Formally, if investor $i$ assigns a valuation $V_i$ to a security, the market price $P$ satisfies:

\[ P = E[V_i \mid V_i \geq V^*] \]

where $V^*$ is the marginal investor’s valuation, which exceeds the unconditional mean valuation $E[V_i]$ whenever short-sale constraints bind for some investors. The degree of overpricing is:

\[ \text{Overpricing} = P - E[V_i] = E[V_i \mid V_i \geq V^*] - E[V_i] \]

which is positive and increasing in the dispersion of the distribution of $V_i$ (i.e., divergence of opinion) and in $V^*$ (i.e., the severity of short-sale constraints).

Miller’s model generates several testable predictions:

Cross-sectional prediction: Stocks with higher divergence of opinion should have lower subsequent returns as prices gradually correct toward fundamental values.
Time-series prediction: Information events that reduce disagreement (e.g., earnings announcements) should be associated with negative abnormal returns for high-DIVOP stocks, as the “optimism premium” dissipates.
Interaction prediction: The overpricing effect should be strongest among stocks that simultaneously exhibit high divergence of opinion and binding short-sale constraints.

33.2 Alternative Theoretical Perspectives

Varian (1985) proposes an alternative framework in which divergence of opinion acts as a risk factor. If investors are risk-averse and disagreement represents genuine uncertainty about future payoffs, then higher dispersion of beliefs should be associated with higher expected returns as compensation for bearing the additional risk. This creates a sharp empirical dichotomy: the Miller hypothesis predicts a negative DIVOP-return relation, whereas the Varian model predicts a positive relation.

The distinction between these theories hinges critically on the market microstructure and institutional setting (@tbl-divop-theories).

Table 33.1: Summary of theoretical predictions for the DIVOP-return relation under different assumptions

Theoretical Framework	Short-Sale Constraints	DIVOP-Return Relation	Key Mechanism
Miller (1977)	Binding	Negative	Optimistic bias in price
Varian (1985)	Non-binding	Positive	Risk premium for uncertainty
Hong and Stein (2003)	Binding, gradual info	Negative, time-varying	Slow diffusion of bearish views
Scheinkman and Xiong (2003)	Binding, overconfidence	Negative	Speculative bubble premium

Hong and Stein (2003) extend Miller’s framework by incorporating gradual information diffusion. In their model, bearish information is impounded into prices more slowly than bullish information because short-sale constraints raise the cost of acting on negative views. This generates momentum-like patterns in which high-DIVOP stocks exhibit positive short-run returns (as optimists push prices up) followed by negative long-run returns (as bearish information eventually reaches the market).

Scheinkman and Xiong (2003) introduce an additional dimension by noting that when investors are overconfident about their private signals and short-sale constraints bind, stock prices contain a “speculative bubble” component that reflects the option value of reselling the asset to a future investor who may be even more optimistic. This model predicts that both high trading volume and high price volatility should be associated with overpricing, providing a theoretical basis for using volume-based and volatility-based DIVOP proxies.

33.3 Relevance to the Vietnamese Market

The Vietnamese equity market provides an unusually clean setting for testing the Miller hypothesis. Vietnam’s equity market operated without any short-selling mechanism from its inception in 2000 through January 2025, which was a full quarter-century in which the first necessary condition of Miller’s model (binding short-sale constraints) was satisfied by regulation rather than by market frictions. Even after the introduction of covered short selling in 2025, the mechanism remains restricted to securities meeting specific liquidity and market capitalization thresholds, and the regulatory environment imposes borrowing requirements that significantly raise the cost of shorting relative to developed markets.

The dominance of retail investors amplifies the second necessary condition (i.e., heterogeneous beliefs). Research on the Vietnamese market has documented significant herding behavior (Vo and Phan 2017; Vo 2015), sentiment-driven trading (Phan et al. 2023; Nguyen and Pham 2018), and information asymmetry between domestic and foreign investors (Vo 2017). These behavioral characteristics naturally generate wider dispersion of investor valuations compared to markets dominated by institutional investors with access to similar analytical frameworks and information sources.

Table 33.2 compares key institutional features relevant to the DIVOP framework between Vietnam and the United States.

Table 33.2: Institutional comparison of Vietnam and the United States relevant to divergence of opinion

Feature	Vietnam (HOSE/HNX)	United States (NYSE/NASDAQ)
Short selling	Introduced Jan 2025 (limited)	Permitted (Reg SHO since 2005)
Retail investor share of volume	~80-85%	~25%
Settlement cycle	T+2 (T+1 planned for 2026)	T+1 (since May 2024)
Daily price limits	$\pm$ 7% (HOSE), $\pm$ 10% (HNX)	None
Foreign ownership cap	49% (most sectors)	None
Average analyst coverage (VN30)	5-10 analysts	15-25 analysts
Mandatory quarterly reporting	Yes (since 2012)	Yes
Options/derivatives market	VN30 Index Futures (since 2017)	Extensive options/futures

The presence of daily price limits ($\pm$ 7% on HOSE and $\pm$ 10% on HNX) creates an additional mechanism through which divergence of opinion can be amplified. When a stock hits its price limit, investors who wish to trade in the direction of the limit are unable to do so, leading to accumulated unfilled orders and delayed price discovery. This institutional feature may create short-term spikes in measured DIVOP that reflect limit-induced friction rather than genuine disagreement. We address this issue in our empirical methodology by flagging limit-hit days and conducting robustness checks that exclude these observations.

34 Data Sources and Sample Construction

34.1 Data Sources

The construction of DIVOP proxies for the Vietnamese market requires daily stock-level trading data and, for the analyst dispersion measures, individual analyst forecast data. We source all data from DataCore.vn, which provides coverage of all securities listed on HOSE, HNX, and the UPCoM (Unlisted Public Company Market) exchange. Table 34.1 summarizes the datasets and key variables used in this study.

Table 34.1: Data sources and key variables for DIVOP proxy construction

Dataset	Key Variables	Frequency
Daily Stock Trading	Close price, high, low, open, volume, shares outstanding, adjusted price, bid, ask	Daily
Corporate Actions	Dividends, stock splits, bonus issues, rights offerings	Event-based
Company Information	Exchange code, industry classification (ICB), listing date, delisting date	Static/Periodic
Analyst Forecasts	Individual analyst EPS forecasts, announcement dates, fiscal period end, analyst ID, broker name	Per estimate
Market Index	VN-Index daily returns, VN30 returns, HNX-Index returns	Daily
Foreign Ownership	Foreign buy/sell volume, foreign ownership percentage, remaining foreign room	Daily

34.2 Sample Construction

We construct our sample using the following filters, applied sequentially:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from sklearn.linear_model import LinearRegression
from scipy import stats as scipy_stats
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# =============================================================================
# Configuration Parameters
# =============================================================================
# Users can modify these parameters to adjust the methodology
CONFIG = {
    # Sample period
    'beg_date': '2007-01-01',
    'end_date': '2024-12-31',
    
    # Estimation windows (in trading days)
    'est_window': 60,          # Rolling window for SUV and volatility
    'detrend_window': 180,     # Window for DTO detrending median
    'lag': 7,                  # Lag for DTO detrending
    'gap': 5,                  # Gap between estimation period and event date
    
    # Filters
    'min_price': 1000,         # Minimum price in VND
    'min_volume_days': 0.8,    # Min fraction of non-zero volume days in window
    'min_analysts': 3,         # Minimum number of analysts for DISP
    'max_spread_pct': 0.50,    # Maximum bid-ask spread as fraction of midpoint
    'forecast_carry_days': 105,# Days to carry forward stale analyst forecasts
    
    # Exchange identifiers
    'exchanges': ['HOSE', 'HNX'],
    
    # Price limit thresholds (for flagging)
    'price_limit_hose': 0.07,
    'price_limit_hnx': 0.10,
}

print("Configuration parameters loaded successfully.")
print(f"Sample period: {CONFIG['beg_date']} to {CONFIG['end_date']}")
print(f"Estimation window: {CONFIG['est_window']} trading days")
print(f"Detrending window: {CONFIG['detrend_window']} trading days")

Configuration parameters loaded successfully.
Sample period: 2007-01-01 to 2024-12-31
Estimation window: 60 trading days
Detrending window: 180 trading days

The sample universe includes all common stocks (ordinary shares) listed on HOSE and HNX during the period January 2007 through December 2024. We begin in 2007 rather than at market inception (2000 for HOSE, 2005 for HNX) for two reasons. First, the early years of the Vietnamese market were characterized by an extremely small number of listed firms (fewer than 30 on HOSE through 2005), making cross-sectional analysis unreliable. Second, data quality and consistency improve substantially after the market expansion of 2006-2007, during which the number of listed firms on HOSE grew from approximately 40 to over 100.

We apply the following filters to construct the analysis sample:

Security type filter. We retain only common stocks (ordinary shares), excluding preferred shares, exchange-traded funds (ETFs), covered warrants, and certificates of deposit. This is analogous to the standard filter in the U.S. literature that restricts to CRSP share codes 10 and 11.
Exchange filter. We include stocks listed on HOSE and HNX but exclude UPCoM securities in our baseline analysis. UPCoM is a registration-based trading venue with less stringent listing requirements and substantially lower liquidity, which may introduce noise into volume-based and spread-based measures. We include UPCoM in robustness checks.
Price filter. We exclude stock-day observations with closing prices below 1,000 VND. This threshold serves the same purpose as the “penny stock” exclusion common in U.S. studies (typically $1 or $5 thresholds) and helps mitigate the influence of extreme percentage returns and spreads at very low price levels.
Minimum trading activity. For volume-based measures, we require that a stock has non-zero trading volume on at least 80% of trading days within each estimation window. This filter eliminates the most thinly traded securities for which turnover-based measures would be unreliable.

def load_daily_data(config):
    """
    Load daily stock trading data from DataCore.vn.
    
    In practice, this function connects to the DataCore API or reads
    from a local database/CSV. Here we document the expected schema.
    
    Expected columns:
    - ticker: str, stock ticker symbol (e.g., 'VCB', 'HPG', 'VNM')
    - date: datetime, trading date
    - open, high, low, close: float, daily OHLC prices (VND)
    - volume: int, trading volume (shares)
    - shares_outstanding: int, total shares outstanding
    - adjusted_close: float, price adjusted for corporate actions
    - adj_factor: float, cumulative adjustment factor
    - bid, ask: float, best bid/ask at close
    - exchange: str, exchange code ('HOSE', 'HNX', 'UPCOM')
    - industry_icb: str, ICB industry classification code
    - foreign_buy_vol, foreign_sell_vol: int, foreign investor volumes
    - foreign_ownership_pct: float, foreign ownership percentage
    """
    # =========================================================================
    # Replace with actual DataCore API call:
    # from datacore import Client
    # client = Client(api_key='YOUR_KEY')
    # df = client.daily_stock(
    #     start=config['beg_date'], end=config['end_date'],
    #     exchanges=config['exchanges']
    # )
    # =========================================================================
    print("Connect to DataCore.vn and load daily stock data.")
    print("Expected schema: ticker, date, open, high, low, close, volume,")
    print("  shares_outstanding, adjusted_close, adj_factor, bid, ask,")
    print("  exchange, industry_icb, foreign_buy_vol, foreign_sell_vol,")
    print("  foreign_ownership_pct")
    return None  # Replace with actual data


def apply_sample_filters(df, config):
    """Apply sequential sample construction filters."""
    print("\n=== Sample Construction ===")
    n0 = len(df)
    
    # Date filter
    df = df[(df['date'] >= config['beg_date']) &
            (df['date'] <= config['end_date'])].copy()
    print(f"[1] Date filter: {len(df):,} obs (from {n0:,})")
    
    # Exchange filter
    df = df[df['exchange'].isin(config['exchanges'])].copy()
    print(f"[2] Exchange filter ({config['exchanges']}): {len(df):,} obs")
    
    # Price filter
    df = df[df['close'] >= config['min_price']].copy()
    print(f"[3] Price >= {config['min_price']:,} VND: {len(df):,} obs")
    
    # Compute daily return from adjusted prices
    df = df.sort_values(['ticker', 'date'])
    df['ret'] = df.groupby('ticker')['adjusted_close'].pct_change()
    
    # Flag price limit hits
    df['limit_hit'] = (
        ((df['exchange'] == 'HOSE') &
         (df['ret'].abs() >= config['price_limit_hose'] - 0.001)) |
        ((df['exchange'] == 'HNX') &
         (df['ret'].abs() >= config['price_limit_hnx'] - 0.001))
    )
    
    n_tickers = df['ticker'].nunique()
    print(f"\nFinal sample: {len(df):,} stock-day obs, "
          f"{n_tickers} unique tickers")
    print(f"Limit-hit days: {df['limit_hit'].sum():,} "
          f"({100*df['limit_hit'].mean():.2f}%)")
    return df

34.3 Corporate Action Adjustments

Proper adjustment for corporate actions is critical for volume-based DIVOP measures, as events such as stock splits, bonus share issues, and rights offerings change the number of shares outstanding and can create artificial spikes in measured turnover. We need to use cumulative adjustment factors that account for stock dividends (bonus shares), stock splits, rights offerings, and cash dividends (price adjustment only). We use these to construct adjusted volume and adjusted shares outstanding:

\[ \text{AdjVolume}_{i,t} = \text{Volume}_{i,t} \times \text{CumAdjFactor}_{i,t} \]

\[ \text{AdjSharesOut}_{i,t} = \text{SharesOut}_{i,t} \times \text{CumAdjFactor}_{i,t} \]

This ensures that the turnover ratio is consistent across corporate action events.

def adjust_for_corporate_actions(df):
    """Apply cumulative adjustment factors to volume and shares outstanding."""
    df = df.copy()
    df['adj_volume'] = df['volume'] * df['adj_factor']
    df['adj_shares_out'] = df['shares_outstanding'] * df['adj_factor']
    
    # Daily turnover ratio
    df['turnover'] = np.where(
        df['adj_shares_out'] > 0,
        df['adj_volume'] / df['adj_shares_out'],
        np.nan
    )
    
    # Flag extreme turnover (> 50% of float)
    extreme = df['turnover'] > 0.50
    if extreme.any():
        print(f"Warning: {extreme.sum()} obs with turnover > 50%, set to NaN")
        df.loc[extreme, 'turnover'] = np.nan
    
    return df

34.4 Trading Calendar Construction

The rolling regression approach for SUV and volatility requires a trading calendar that ensures each estimation window contains exactly the specified number of trading days. We construct this directly from observed trading dates.

def build_trading_calendar(df, config):
    """
    Map each trading date to its estimation window [est_start, est_end].
    
    For date t, the estimation window runs from
    t - gap - est_window to t - gap - 1 (in trading-day terms).
    """
    trading_dates = sorted(df['date'].unique())
    trading_dates = pd.Series(trading_dates)
    
    est_window = config['est_window']
    gap = config['gap']
    offset = est_window + gap
    
    records = []
    for i in range(offset, len(trading_dates)):
        records.append({
            'date': trading_dates.iloc[i],
            'est_start': trading_dates.iloc[i - gap - est_window],
            'est_end': trading_dates.iloc[i - gap - 1]
        })
    
    calendar = pd.DataFrame(records)
    print(f"Trading calendar: {len(calendar)} dates, "
          f"{calendar['date'].min()} to {calendar['date'].max()}")
    return calendar

35 Volume-Based DIVOP Proxies

35.1 Theoretical Motivation

Trading volume has long been recognized as a natural proxy for divergence of investor opinion. In the rational expectations framework of Milgrom and Stokey (1982), trade occurs only when investors disagree about the value of a security (i.e., a “no-trade theorem” that implies, by contrapositive, that observed trading volume must reflect some form of heterogeneous beliefs). Harris and Raviv (1993) and Kandel and Pearson (1995) formalize this intuition, showing that trading volume is positively related to the dispersion of investors’ prior beliefs and to the degree to which public information is differentially interpreted.

The challenge in using raw trading volume as a DIVOP proxy is that volume is also driven by factors unrelated to disagreement, including portfolio rebalancing, liquidity needs, tax-loss selling, and index reconstitution effects. Garfinkel (2009) proposes two approaches to extract the disagreement component from raw volume. The first, Unexplained Volume (DTO), removes market-wide volume effects and secular trends. The second, Standardized Unexplained Volume (SUV), additionally controls for the information content of returns through a cross-sectional regression, isolating the “pure disagreement” component of trading activity.

35.2 Unexplained Volume (DTO)

35.2.1 Construction Methodology

The construction of the Unexplained Volume measure proceeds in four steps.

Step 1: Compute firm-level daily turnover. For each stock $i$ on day $t$:

\[ \text{Turn}_{i,t} = \frac{\text{AdjVolume}_{i,t}}{\text{AdjSharesOut}_{i,t}} \]

Step 2: Compute market-wide turnover. We calculate aggregate turnover across all common stocks as a value-weighted average:

\[ \text{MktTurn}_{t} = \frac{\sum_{i} \text{AdjVolume}_{i,t}}{\sum_{i} \text{AdjSharesOut}_{i,t}} \]

Unlike the U.S. methodology that computes market turnover across NYSE/AMEX stocks only and applies a scaling adjustment for NASDAQ securities (following A.-M. Anderson and Dyl 2005), we compute market turnover across all HOSE and HNX common stocks without any exchange-specific volume scaling. Both Vietnamese exchanges operate as order-driven markets (HOSE uses continuous order matching; HNX uses a combination of continuous matching and periodic call auctions) without the dealer-market double-counting issue that necessitates the NASDAQ volume adjustment in U.S. studies.

Step 3: Compute market-adjusted turnover.

\[ \text{MATO}_{i,t} = \text{Turn}_{i,t} - \text{MktTurn}_{t} \]

Step 4: Detrend by rolling median. To remove secular trends in firm-specific trading activity:

\[ \text{DTO}_{i,t} = \text{MATO}_{i,t} - \text{Median}_{180}(\text{MATO}_{i,t-7}) \]

where $\text{Median}_{180}(\text{MATO}_{i,t-7})$ is the median of market-adjusted turnover over the 180-trading-day window ending 7 days before date $t$. The 7-day lag prevents the current day’s turnover from influencing its own detrending baseline.

def compute_market_turnover(df):
    """Compute daily market-wide turnover across all stocks."""
    mkt_turn = df.groupby('date').apply(
        lambda x: x['adj_volume'].sum() / x['adj_shares_out'].sum()
        if x['adj_shares_out'].sum() > 0 else np.nan
    ).reset_index()
    mkt_turn.columns = ['date', 'market_turnover']
    return mkt_turn


def compute_dto(df, config):
    """
    Construct Unexplained Volume (DTO).
    
    Steps:
    1. Subtract market turnover -> MATO
    2. Rolling 180-day median of MATO (lagged 7 days) -> trend
    3. DTO = MATO - trend
    """
    detrend_window = config['detrend_window']
    lag = config['lag']
    
    # Market turnover
    mkt_turn = compute_market_turnover(df)
    df = df.merge(mkt_turn, on='date', how='left')
    
    # Market-adjusted turnover
    df['mato'] = df['turnover'] - df['market_turnover']
    
    # Rolling median with lag, computed per stock
    df = df.sort_values(['ticker', 'date'])
    
    def _rolling_median_lagged(group):
        mato = group['mato']
        med = mato.rolling(
            window=detrend_window,
            min_periods=int(detrend_window * 0.5)
        ).median()
        return med.shift(lag)
    
    df['mato_trend'] = (
        df.groupby('ticker', group_keys=False)
          .apply(lambda g: _rolling_median_lagged(g))
    )
    
    # DTO
    df['dto'] = df['mato'] - df['mato_trend']
    
    print("DTO construction complete.")
    print(f"  Non-missing: {df['dto'].notna().sum():,}")
    print(f"  Mean: {df['dto'].mean():.6f}, Std: {df['dto'].std():.6f}")
    return df

35.2.2 Vietnam-Specific Considerations for DTO

Several features of the Vietnamese market require attention when constructing DTO:

No NASDAQ-type volume adjustment needed. Both HOSE and HNX are order-driven auction markets. The double-counting adjustment applied to NASDAQ securities in the U.S. literature is not necessary.
Thinly traded stocks. A substantial fraction of listed Vietnamese stocks, particularly on HNX, may have zero volume on many trading days. For stocks with intermittent trading, the rolling median may be biased toward zero, making DTO less informative. We require at least 80% non-zero volume days in each estimation window.
Price limit effects on volume. When a stock hits its daily price limit, unfilled orders accumulate and recorded volume may understate true clearing volume. The following day often shows a “catch-up” effect. Researchers should consider flagging limit-hit days.
Foreign investor trading decomposition. DataCore provides volume by investor type (foreign versus domestic). Researchers may wish to construct separate DTO measures for foreign and domestic volume, or use the foreign-to-domestic volume ratio as an additional dimension of disagreement.

35.3 Standardized Unexplained Volume (SUV)

35.3.1 Construction Methodology

The Standardized Unexplained Volume measure, proposed by Garfinkel (2009), isolates the disagreement component of volume by explicitly controlling for the information content of returns. The insight is that trading volume has both a liquidity component and an informedness component correlated with the magnitude and sign of returns. By regressing turnover on signed returns and extracting the standardized residual, SUV captures volume attributable to disagreement after controlling for both liquidity trends and information-driven trading.

For each stock $i$, on each trading date $t$, we estimate using data from the estimation window $[\tau_1, \tau_2]$:

\[ \text{Turn}_{i,s} = \alpha_i + \beta_i^{+} \cdot \text{RetPos}_{i,s} + \beta_i^{-} \cdot \text{RetNeg}_{i,s} + \epsilon_{i,s}, \quad s \in [\tau_1, \tau_2] \tag{35.1}\]

where $\text{RetPos}_{i,s} = |r_{i,s}| \cdot \mathbf{1}(r_{i,s} > 0)$ and $\text{RetNeg}_{i,s} = |r_{i,s}| \cdot \mathbf{1}(r_{i,s} < 0)$.

The Standardized Unexplained Volume on date $t$ is:

\[ \text{SUV}_{i,t} = \frac{\text{Turn}_{i,t} - \hat{\text{Turn}}_{i,t}}{\hat{\sigma}_{\epsilon,i}} \tag{35.2}\]

where $\hat{\text{Turn}}_{i,t}$ is the predicted turnover and $\hat{\sigma}_{\epsilon,i}$ is the RMSE from Equation 35.1.

The asymmetric specification with separate coefficients for positive and negative returns reflects that the volume-return relation differs by return sign. In the U.S., buying pressure tends to generate more volume than selling pressure due to short-sale frictions. In Vietnam, where short selling was unavailable until 2025, this asymmetry should be even more pronounced because all selling activity was constrained to existing shareholders.

def compute_suv(df, calendar, config):
    """
    Compute Standardized Unexplained Volume via rolling regressions.
    
    For each stock-date, regress Turn on RetPos and RetNeg over the
    estimation window, then compute SUV = (actual - predicted) / RMSE.
    """
    est_window = config['est_window']
    min_obs = int(est_window * config['min_volume_days'])
    
    # Prepare signed return components
    df = df.copy()
    df['ret_pos'] = np.where(df['ret'] > 0, np.abs(df['ret']), 0.0)
    df['ret_neg'] = np.where(
        (df['ret'] < 0) & df['ret'].notna(), np.abs(df['ret']), 0.0
    )
    
    results = []
    grouped = {t: g for t, g in df.groupby('ticker')}
    
    for _, cal_row in calendar.iterrows():
        dt = cal_row['date']
        est_s, est_e = cal_row['est_start'], cal_row['est_end']
        
        for ticker, tdata in grouped.items():
            # Estimation window
            est = tdata[
                (tdata['date'] >= est_s) & (tdata['date'] <= est_e)
            ].dropna(subset=['turnover', 'ret_pos', 'ret_neg'])
            
            if len(est) < min_obs:
                continue
            
            # Event date
            evt = tdata[tdata['date'] == dt]
            if evt.empty or evt['turnover'].isna().all():
                continue
            
            # OLS: Turn = alpha + beta_pos * RetPos + beta_neg * RetNeg
            X = est[['ret_pos', 'ret_neg']].values
            y = est['turnover'].values
            
            reg = LinearRegression().fit(X, y)
            y_hat = reg.predict(X)
            rmse = np.sqrt(np.mean((y - y_hat) ** 2))
            
            if rmse <= 0:
                continue
            
            # Predict and standardize for event date
            X_evt = evt[['ret_pos', 'ret_neg']].values
            pred = reg.predict(X_evt)[0]
            actual = evt['turnover'].values[0]
            suv = (actual - pred) / rmse
            
            results.append({
                'ticker': ticker, 'date': dt,
                'suv': suv,
                'predicted_turnover': pred,
                'rmse_turn': rmse,
                'n_est': len(est),
                'alpha_turn': reg.intercept_,
                'beta_pos': reg.coef_[0],
                'beta_neg': reg.coef_[1],
            })
    
    suv_df = pd.DataFrame(results)
    print(f"SUV: {len(suv_df):,} stock-date obs")
    print(f"  Mean: {suv_df['suv'].mean():.4f}, "
          f"Median: {suv_df['suv'].median():.4f}")
    return suv_df

35.3.2 Interpreting the SUV Regression Coefficients

The estimated coefficients from Equation 35.1 are informative about market microstructure. Garfinkel (2009) reports $\hat{\beta}^{+} > \hat{\beta}^{-}$ for most U.S. stocks. In Vietnam, we expect this asymmetry to be even stronger because:

No short selling (pre-2025): All selling is by existing shareholders, limiting volume response to negative returns.
T+2 settlement: Investors cannot immediately reinvest sale proceeds, further dampening sell-side volume.
Price limits: The $\pm$ 7% (HOSE) and $\pm$ 10% (HNX) daily limits truncate the return distribution, compressing the range of both regressors.

Researchers should report summary statistics of $(\hat{\alpha}, \hat{\beta}^{+}, \hat{\beta}^{-}, R^2)$ across the cross-section and over time.

def suv_diagnostics(suv_df):
    """Report cross-sectional summary of SUV regression parameters."""
    print("\n=== SUV Regression Diagnostics ===")
    
    params = ['alpha_turn', 'beta_pos', 'beta_neg']
    print(suv_df[params].describe(
        percentiles=[.05, .25, .50, .75, .95]
    ).T.to_string(float_format='{:.6f}'.format))
    
    # Asymmetry test
    diff = suv_df['beta_pos'] - suv_df['beta_neg']
    print(f"\nbeta_pos - beta_neg: mean = {diff.mean():.6f}, "
          f"frac > 0 = {(diff > 0).mean():.3f}")

36 Volatility-Based DIVOP Proxies

36.1 Total Return Volatility

36.1.1 Theoretical Motivation

Stock return volatility serves as a proxy for divergence of opinion through several channels. Shalen (1993) develops a model in which both volume and volatility are increasing in the dispersion of investor beliefs. Scheinkman and Xiong (2003) predict that higher volatility reflects the speculative trading component driven by overconfident investors who disagree about value. Empirically, Boehme, Danielsen, and Sorescu (2006) and Chatterjee, John, and Yan (2012) use idiosyncratic volatility as a DIVOP proxy and find it positively correlated with other disagreement measures and negatively associated with subsequent returns when short-sale constraints bind.

36.1.2 Construction

Total return volatility is the standard deviation of daily returns over the rolling estimation window:

\[ \text{VOLATILITY}_{i,t} = \sqrt{\frac{1}{N_i - 1} \sum_{s \in [\tau_1, \tau_2]} (r_{i,s} - \bar{r}_i)^2} \tag{36.1}\]

where $N_i$ is the number of non-missing return observations for stock $i$ in the window $[\tau_1, \tau_2]$.

36.2 Idiosyncratic Volatility (IVOL)

Idiosyncratic volatility isolates firm-specific return variation by removing the systematic component explained by market movements. We compute IVOL from the residuals of a market model:

\[ r_{i,s} = \alpha_i + \beta_i \cdot r_{m,s} + \epsilon_{i,s}, \quad s \in [\tau_1, \tau_2] \tag{36.2}\]

\[ \text{IVOL}_{i,t} = \text{Std}(\hat{\epsilon}_{i,s}) \tag{36.3}\]

Researchers may extend this to a Fama and French (1993) three-factor or five-factor model using Vietnamese factor portfolios constructed elsewhere in this book. A richer factor model yields IVOL estimates that better isolate truly idiosyncratic disagreement, at the cost of requiring factor portfolio construction.

def compute_volatility(df, calendar, config):
    """
    Compute total return volatility and idiosyncratic volatility
    via rolling estimation windows.
    
    Total vol = std(returns) in window.
    IVOL = std(residuals) from market model regression.
    """
    est_window = config['est_window']
    min_obs = int(est_window * config['min_volume_days'])
    
    # Value-weighted market return
    def _vw_ret(g):
        valid = g.dropna(subset=['ret'])
        if valid.empty:
            return np.nan
        w = valid['adj_shares_out'] * valid['close']
        return np.average(valid['ret'], weights=w)
    
    mkt_ret = df.groupby('date').apply(_vw_ret).reset_index()
    mkt_ret.columns = ['date', 'mkt_ret']
    df = df.merge(mkt_ret, on='date', how='left')
    
    results = []
    grouped = {t: g for t, g in df.groupby('ticker')}
    
    for _, cal_row in calendar.iterrows():
        dt = cal_row['date']
        est_s, est_e = cal_row['est_start'], cal_row['est_end']
        
        for ticker, tdata in grouped.items():
            est = tdata[
                (tdata['date'] >= est_s) & (tdata['date'] <= est_e)
            ].dropna(subset=['ret', 'mkt_ret'])
            
            if len(est) < min_obs:
                continue
            
            # Total volatility
            total_vol = est['ret'].std()
            
            # Market model -> IVOL
            X = est[['mkt_ret']].values
            y = est['ret'].values
            reg = LinearRegression().fit(X, y)
            resid = y - reg.predict(X)
            ivol = np.std(resid, ddof=1)
            
            results.append({
                'ticker': ticker, 'date': dt,
                'total_volatility': total_vol,
                'idio_volatility': ivol,
                'market_beta': reg.coef_[0],
                'market_alpha': reg.intercept_,
                'r_squared_mm': reg.score(X, y),
                'n_vol': len(est),
            })
    
    vol_df = pd.DataFrame(results)
    print(f"Volatility: {len(vol_df):,} stock-date obs")
    print(f"  Total vol (ann. mean): "
          f"{vol_df['total_volatility'].mean() * np.sqrt(252):.4f}")
    print(f"  IVOL (ann. mean): "
          f"{vol_df['idio_volatility'].mean() * np.sqrt(252):.4f}")
    return vol_df

36.2.1 Vietnam-Specific Considerations for Volatility

Price limits compress measured volatility. Daily limits of $\pm$ 7% (HOSE) and $\pm$ 10% (HNX) mechanically truncate the return distribution, leading to underestimation of true volatility. On limit-hit days, the true equilibrium return may exceed the observed return. Researchers should be aware that volatility-based DIVOP measures may be downward-biased for stocks that frequently hit limits.
VN-Index concentration. The VN-Index is highly concentrated, the top 10 stocks often account for 50-60% of index weight. For small- and mid-cap stocks, an equal-weighted market return or a composite HOSE+HNX index may provide a better market factor in Equation 36.2.
Thin trading and non-synchronous returns. For thinly traded stocks, consecutive zero-return days can depress measured volatility. The Dimson (1979) adjustment (including lagged and lead market returns in the market model) may help correct for non-synchronous trading bias in the beta estimate, though its effect on IVOL is typically small.

37 Spread-Based and Liquidity DIVOP Proxies

37.1 Bid-Ask Spread (BASPREAD)

37.1.1 Theoretical Motivation

The bid-ask spread reflects the adverse selection costs faced by limit order providers. When investors hold heterogeneous beliefs, each trade is more likely to convey private information, raising the adverse selection component of the spread. Handa, Schwartz, and Tiwari (2003) show that in order-driven markets the spread widens when divergence of opinion increases because limit order providers face greater risk of being picked off by informed traders. Chung and Zhang (2014) demonstrate that closing bid-ask spreads from daily data provide a reliable approximation to intraday effective spreads.

37.1.2 Construction

We compute the proportional bid-ask spread using end-of-day quote data:

\[ \text{BASPREAD}_{i,t} = \frac{\text{Ask}_{i,t} - \text{Bid}_{i,t}}{\text{Midpoint}_{i,t}} \tag{37.1}\]

where $\text{Midpoint}_{i,t} = (\text{Ask}_{i,t} + \text{Bid}_{i,t}) / 2$. When end-of-day bid and ask are unavailable, we use the daily high-low range as a fallback. Following Chung and Zhang (2014), we delete observations where both Bid and Ask are zero, and where the spread exceeds 50% of the midpoint.

37.2 Amihud Illiquidity (ILLIQ)

The Amihud (2002) ratio measures the price impact of order flow:

\[ \text{ILLIQ}_{i,t} = \frac{|r_{i,t}|}{\text{DolVol}_{i,t}} \tag{37.2}\]

where $\text{DolVol}_{i,t} = \text{Volume}_{i,t} \times \text{Price}_{i,t}$ (in billions VND for scaling). Higher ILLIQ reflects greater information asymmetry. We average daily ratios over monthly horizons and use the log transformation due to heavy right skew.

def compute_spread_and_illiq(df, config):
    """Compute bid-ask spread (BASPREAD) and Amihud illiquidity."""
    df = df.copy()
    
    # --- Bid-Ask Spread ---
    df['midpoint_ba'] = (df['ask'] + df['bid']) / 2
    df['baspread_ba'] = np.where(
        (df['ask'] > 0) & (df['bid'] > 0) & (df['midpoint_ba'] > 0),
        (df['ask'] - df['bid']) / df['midpoint_ba'], np.nan
    )
    
    # Fallback: high/low range
    df['midpoint_hl'] = (df['high'] + df['low']) / 2
    df['baspread_hl'] = np.where(
        (df['high'] > 0) & (df['low'] > 0) & (df['midpoint_hl'] > 0),
        (df['high'] - df['low']) / df['midpoint_hl'], np.nan
    )
    
    df['baspread'] = df['baspread_ba'].fillna(df['baspread_hl'])
    df['midpoint'] = df['midpoint_ba'].fillna(df['midpoint_hl'])
    
    # Chung & Zhang (2009) filters
    bad = (df['baspread'].isna()) | \
          (df['baspread'] > config['max_spread_pct']) | \
          (df['baspread'] < 0)
    df.loc[bad, 'baspread'] = np.nan
    
    # --- Amihud Illiquidity ---
    df['dollar_vol'] = df['volume'] * df['close'] / 1e9
    df['amihud_daily'] = np.where(
        df['dollar_vol'] > 0,
        np.abs(df['ret']) / df['dollar_vol'], np.nan
    )
    
    print(f"BASPREAD: {df['baspread'].notna().sum():,} valid obs, "
          f"mean = {df['baspread'].mean():.6f}")
    print(f"AMIHUD: {df['amihud_daily'].notna().sum():,} valid obs, "
          f"mean = {df['amihud_daily'].mean():.6f}")
    return df


def compute_amihud_monthly(df):
    """Monthly Amihud = mean daily |ret|/dollar_vol (min 15 days)."""
    df = df.copy()
    df['ym'] = df['date'].dt.to_period('M')
    agg = df.groupby(['ticker', 'ym']).agg(
        illiq_mean=('amihud_daily', 'mean'),
        n_days=('amihud_daily', 'count'),
    ).reset_index()
    agg = agg[agg['n_days'] >= 15].copy()
    agg['log_illiq'] = np.log(agg['illiq_mean'] + 1e-10)
    return agg

37.2.1 Vietnam-Specific Considerations for Spread and Liquidity

Tick size schedule. Vietnam uses variable tick sizes: 10 VND (prices < 10,000), 50 VND (10,000–49,950), and 100 VND (≥ 50,000) on HOSE. These impose a floor on quoted spreads for low-priced stocks. Researchers should be cautious interpreting cross-price-decile spread variation as reflecting opinion divergence rather than tick-size mechanics.
Order-driven market structure. Both HOSE and HNX are pure order-driven markets where public limit orders provide liquidity. This makes the Chung and Zhang (2014) CRSP-based spread approximation appropriate.
Lot size requirements. HOSE requires 100-share standard lots for continuous trading. For high-priced stocks, the standard lot represents a large capital commitment, potentially inflating quoted spreads relative to effective trading costs.
Call auction effects. Opening and closing sessions on HOSE use periodic call auctions, which can produce bid-ask quotes that differ substantially from continuous-trading spreads.

38 Analyst Forecast Dispersion

38.1 Theoretical Motivation

Analyst forecast dispersion, the cross-sectional standard deviation of individual analysts’ earnings forecasts, is the most direct measure of divergence of opinion. Unlike market-based proxies that capture disagreement indirectly, forecast dispersion directly measures disagreement among informed market participants. Abarbanell, Lanen, and Verrecchia (1995) establish the theoretical basis, and Diether, Malloy, and Scherbina (2002) demonstrate that stocks with higher analyst forecast dispersion earn lower subsequent returns, consistent with the Miller overpricing hypothesis.

38.2 Data Challenges in Vietnam

Constructing analyst forecast dispersion in Vietnam presents substantial challenges relative to the U.S.:

Coverage breadth. While I/B/E/S covers over 4,000 U.S. companies, only 100–150 Vietnamese firms typically have coverage by at least 3 analysts, concentrated among VN30 constituents.
Data sources. Analyst forecasts are available from DataCore.vn, FiinPro, Bloomberg, and Refinitiv. The choice of source affects coverage and timeliness.
Forecast staleness. With limited coverage, forecasts may go unrevised for months. Following I/B/E/S methodology, we carry each forecast forward for a maximum of 105 days.

38.3 Construction Methodology

The construction proceeds as follows:

Clean individual forecasts. Remove observations where the announcement date precedes the review date. Keep only annual EPS forecasts. For each analyst-ticker-fiscal period, retain only the latest forecast per calendar month.
Handle stopped and excluded estimates. Remove forecasts where the analyst has left the brokerage or the estimate has been excluded from consensus.
Carry forward with staleness control. Each forecast is valid until the earlier of: (a) the next forecast by the same analyst, (b) 105 days after the announcement, or (c) the actual earnings announcement date.
Expand to monthly frequency. For each ticker-month, identify all valid outstanding forecasts and compute dispersion.
Compute scaled measures:

\[ \text{DISP1}_{i,m} = \frac{\text{Std}(\hat{\text{EPS}}_{i,m}^{(a)})}{|\text{Mean}(\hat{\text{EPS}}_{i,m}^{(a)})|} \qquad \text{DISP2}_{i,m} = \frac{\text{Std}(\hat{\text{EPS}}_{i,m}^{(a)})}{\bar{P}_{i,m}} \]

def construct_analyst_dispersion(forecasts_df, price_df, config):
    """
    Construct analyst forecast dispersion measures.
    
    Parameters
    ----------
    forecasts_df : pd.DataFrame
        Individual analyst forecasts with: ticker, analyst_id, broker,
        fpedats, anndats, revdats, value (EPS), anndats_act.
    price_df : pd.DataFrame
        Monthly price: ticker, month, mean_price.
    config : dict
        With min_analysts, forecast_carry_days.
    """
    carry_days = config['forecast_carry_days']
    min_analysts = config['min_analysts']
    
    df = forecasts_df.copy()
    df = df[df['anndats'] <= df['revdats']].copy()
    df = df.dropna(subset=['fpedats', 'anndats', 'value'])
    
    # Latest forecast per analyst-month
    df['ym'] = df['anndats'].dt.to_period('M')
    df = df.sort_values(
        ['ticker', 'fpedats', 'analyst_id', 'ym', 'anndats', 'revdats']
    )
    df = df.groupby(['ticker', 'fpedats', 'analyst_id', 'ym']).tail(1)
    
    # Carry-forward end date
    df = df.sort_values(
        ['ticker', 'analyst_id', 'fpedats', 'anndats'],
        ascending=[True, True, True, False]
    )
    df['next_ann'] = df.groupby(
        ['ticker', 'analyst_id', 'fpedats']
    )['anndats'].shift(-1)
    
    def _carry_end(row):
        candidates = [row['anndats'] + pd.Timedelta(days=carry_days)]
        if pd.notna(row.get('next_ann')):
            candidates.append(row['next_ann'])
        if pd.notna(row.get('anndats_act')):
            candidates.append(row['anndats_act'])
        return min(candidates)
    
    df['carry_end'] = df.apply(_carry_end, axis=1)
    
    # Monthly expansion
    months = pd.period_range(config['beg_date'], config['end_date'], freq='M')
    records = []
    for month in months:
        me = month.to_timestamp(how='end')
        valid = df[(df['anndats'] <= me) & (df['carry_end'] > me)].copy()
        valid = valid[valid['fpedats'] > me]
        valid = valid.sort_values(['ticker', 'analyst_id', 'anndats'])
        valid = valid.groupby(['ticker', 'analyst_id']).tail(1)
        
        disp = valid.groupby('ticker').agg(
            n_analysts=('analyst_id', 'nunique'),
            mean_fcst=('value', 'mean'),
            std_fcst=('value', 'std'),
        ).reset_index()
        disp['month'] = month
        records.append(disp)
    
    if not records:
        return pd.DataFrame()
    disp_df = pd.concat(records, ignore_index=True)
    
    # Scaled measures
    disp_df['disp1'] = np.where(
        disp_df['mean_fcst'].abs() > 0,
        disp_df['std_fcst'] / disp_df['mean_fcst'].abs(), np.nan
    )
    disp_df = disp_df.merge(price_df, on=['ticker', 'month'], how='left')
    disp_df['disp2'] = np.where(
        disp_df['mean_price'] > 0,
        disp_df['std_fcst'] / disp_df['mean_price'], np.nan
    )
    disp_df['disp_raw'] = disp_df['std_fcst']
    
    out = disp_df[disp_df['n_analysts'] >= min_analysts].copy()
    print(f"DISP: {len(out):,} ticker-months (>= {min_analysts} analysts)")
    print(f"  Mean analysts: {out['n_analysts'].mean():.1f}")
    return out

38.4 Scaling Considerations

Following Cheong and Thomas (2011), we note that each scaling choice has pitfalls. DISP1 (scaled by absolute mean forecast) can produce extreme values when the mean forecast approaches zero—common for Vietnamese firms near breakeven. DISP2 (scaled by price) introduces a mechanical negative correlation between price and scaled dispersion. We recommend reporting all three versions (DISP1, DISP2, and unscaled DISP_RAW with $\ln(\text{Price})$ as an additional control), and winsorizing DISP1 at the 1st and 99th percentiles.

Caution on Analyst Dispersion in Thin-Coverage Markets

With typical coverage of 5–10 analysts per firm in Vietnam (versus 15–25 in the U.S.), forecast dispersion is estimated with substantially greater noise. A dispersion measure from 3 analysts has a very different sampling distribution than one from 20. Always include the number of analysts as a control and test robustness with varying minimum-analyst thresholds (3, 5, 7).

39 Cross-Sectional Correlations Among DIVOP Proxies

An important empirical question is the degree to which the various DIVOP proxies capture the same underlying construct. If divergence of opinion is a well-defined latent variable, we expect positive correlations among all proxies, though correlations need not be high since each captures a different facet of disagreement.

def compute_divop_correlations(merged_df, proxies=None):
    """
    Compute and visualize Spearman correlations among DIVOP proxies.
    We use rank correlations because many proxies are right-skewed.
    """
    if proxies is None:
        proxies = [
            'dto', 'suv', 'total_volatility', 'idio_volatility',
            'baspread', 'amihud_daily', 'disp1', 'disp2'
        ]
    available = [p for p in proxies if p in merged_df.columns]
    data = merged_df[available].dropna()
    
    n = len(available)
    rho_mat = np.eye(n)
    p_mat = np.zeros((n, n))
    for i in range(n):
        for j in range(i + 1, n):
            rho, p = scipy_stats.spearmanr(
                data[available[i]], data[available[j]]
            )
            rho_mat[i, j] = rho_mat[j, i] = rho
            p_mat[i, j] = p_mat[j, i] = p
    
    labels = {'dto': 'DTO', 'suv': 'SUV',
              'total_volatility': 'VOL', 'idio_volatility': 'IVOL',
              'baspread': 'SPREAD', 'amihud_daily': 'ILLIQ',
              'disp1': 'DISP1', 'disp2': 'DISP2'}
    pretty = [labels.get(c, c) for c in available]
    corr_df = pd.DataFrame(rho_mat, index=pretty, columns=pretty)
    
    # Heatmap
    fig, ax = plt.subplots(figsize=(9, 7))
    mask = np.triu(np.ones_like(corr_df, dtype=bool), k=1)
    sns.heatmap(
        corr_df, mask=mask, annot=True, fmt='.3f',
        cmap='RdBu_r', center=0, vmin=-0.4, vmax=0.7,
        square=True, linewidths=0.5,
        cbar_kws={'shrink': 0.8, 'label': 'Spearman ρ'}, ax=ax
    )
    ax.set_title('Spearman Correlations Among DIVOP Proxies\n'
                  'Vietnamese Equity Market', fontsize=13, fontweight='bold')
    plt.tight_layout()
    plt.savefig('divop_correlations.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    return corr_df

39.0.1 Expected Correlation Patterns

Based on U.S. evidence and theory, we expect:

Table 39.1: Expected correlation structure among DIVOP proxies

Pair	Expected	Rationale
DTO × SUV	High positive	Both capture abnormal volume; SUV refines DTO
VOL × IVOL	High positive	IVOL is a subset of total volatility
SPREAD × ILLIQ	Moderate-high positive	Both capture information asymmetry
Volume × Volatility	Moderate positive	Shalen (1993) links both to belief dispersion
Analyst × Market-based	Weak-moderate positive	Different investor populations

40 Descriptive Statistics and Cross-Sectional Properties

40.1 Summary Statistics

def descriptive_statistics(merged_df):
    """Comprehensive descriptive statistics for DIVOP proxies."""
    proxies = {
        'dto': 'Unexplained Volume (DTO)',
        'suv': 'Std Unexplained Volume (SUV)',
        'total_volatility': 'Total Return Volatility',
        'idio_volatility': 'Idiosyncratic Volatility',
        'baspread': 'Bid-Ask Spread',
        'amihud_daily': 'Amihud Illiquidity',
        'disp1': 'Analyst Disp (mean-scaled)',
        'disp2': 'Analyst Disp (price-scaled)',
    }
    avail = {k: v for k, v in proxies.items() if k in merged_df.columns}
    rows = []
    for col, label in avail.items():
        s = merged_df[col].dropna()
        rows.append({
            'Proxy': label, 'N': f'{len(s):,}',
            'Mean': f'{s.mean():.6f}', 'Std': f'{s.std():.6f}',
            'P5': f'{s.quantile(.05):.6f}',
            'Median': f'{s.median():.6f}',
            'P95': f'{s.quantile(.95):.6f}',
            'Skew': f'{s.skew():.2f}',
            'Kurt': f'{s.kurtosis():.2f}',
        })
    stats = pd.DataFrame(rows).set_index('Proxy')
    print("\n" + "=" * 90)
    print("Descriptive Statistics of DIVOP Proxies")
    print("Vietnamese Equity Market, HOSE and HNX")
    print("=" * 90)
    print(stats.to_string())
    return stats

40.2 DIVOP by Firm Characteristics

def divop_by_size(merged_df):
    """Mean DIVOP proxies by market-cap quintile."""
    df = merged_df.copy()
    df['mkt_cap'] = df['close'] * df['shares_outstanding']
    df['size_q'] = df.groupby('date')['mkt_cap'].transform(
        lambda x: pd.qcut(x, 5,
            labels=['Q1 Small','Q2','Q3','Q4','Q5 Large'],
            duplicates='drop')
    )
    proxies = ['dto','suv','total_volatility','idio_volatility',
               'baspread','amihud_daily']
    avail = [p for p in proxies if p in df.columns]
    tab = df.groupby('size_q')[avail].mean()
    print("\n=== Mean DIVOP by Size Quintile ===")
    print(tab.to_string(float_format='{:.6f}'.format))
    return tab

def divop_by_exchange(merged_df):
    """Compare mean DIVOP across HOSE and HNX."""
    proxies = ['dto','suv','total_volatility','idio_volatility',
               'baspread','amihud_daily']
    avail = [p for p in proxies if p in merged_df.columns]
    tab = merged_df.groupby('exchange')[avail].mean()
    print("\n=== Mean DIVOP by Exchange ===")
    print(tab.to_string(float_format='{:.6f}'.format))
    return tab

40.3 Time-Series Evolution

def plot_divop_timeseries(merged_df):
    """Plot monthly cross-sectional median DIVOP with crisis shading."""
    df = merged_df.copy()
    df['ym'] = df['date'].dt.to_period('M')
    proxies = ['dto','suv','total_volatility','baspread']
    avail = [p for p in proxies if p in df.columns]
    monthly = df.groupby('ym')[avail].median()
    monthly.index = monthly.index.to_timestamp()
    
    fig, axes = plt.subplots(len(avail), 1,
        figsize=(13, 3.5*len(avail)), sharex=True)
    if len(avail) == 1: axes = [axes]
    
    labels = {'dto':'DTO','suv':'SUV',
              'total_volatility':'Volatility','baspread':'Spread'}
    colors = ['#1976D2','#388E3C','#F57C00','#D32F2F']
    
    for i, (proxy, ax) in enumerate(zip(avail, axes)):
        ax.plot(monthly.index, monthly[proxy],
                color=colors[i], linewidth=1.3)
        ax.set_ylabel(labels.get(proxy, proxy), fontsize=10)
        ax.grid(True, alpha=0.25)
        for s, e, c in [('2008-01','2009-06','red'),
                         ('2020-01','2020-12','orange'),
                         ('2022-09','2023-06','purple')]:
            ax.axvspan(pd.Timestamp(s), pd.Timestamp(e),
                        alpha=0.1, color=c)
    
    axes[0].set_title(
        'Time-Series of DIVOP Proxies\n'
        'Monthly Cross-Sectional Median, HOSE & HNX',
        fontsize=13, fontweight='bold')
    from matplotlib.patches import Patch
    axes[-1].legend(handles=[
        Patch(facecolor='red', alpha=.2, label='GFC 2008-09'),
        Patch(facecolor='orange', alpha=.2, label='COVID-19'),
        Patch(facecolor='purple', alpha=.2, label='Bond Crisis 2022-23'),
    ], loc='upper right', fontsize=8)
    plt.tight_layout()
    plt.savefig('divop_timeseries.png', dpi=300, bbox_inches='tight')
    plt.show()

41 Putting It All Together

def build_divop_dataset(config):
    """
    Master pipeline: load data, construct all DIVOP proxies,
    merge into a single stock-date panel.
    """
    df = load_daily_data(config)
    df = apply_sample_filters(df, config)
    df = adjust_for_corporate_actions(df)
    calendar = build_trading_calendar(df, config)
    
    df = compute_dto(df, config)
    suv_df = compute_suv(df, calendar, config)
    vol_df = compute_volatility(df, calendar, config)
    df = compute_spread_and_illiq(df, config)
    
    # Merge
    base = df[['ticker','date','ret','close','volume',
                'shares_outstanding','exchange','industry_icb',
                'foreign_ownership_pct','turnover',
                'mato','dto','baspread','amihud_daily','limit_hit']].copy()
    
    if not suv_df.empty:
        base = base.merge(
            suv_df[['ticker','date','suv','predicted_turnover']],
            on=['ticker','date'], how='left')
    if not vol_df.empty:
        base = base.merge(
            vol_df[['ticker','date','total_volatility',
                     'idio_volatility','market_beta']],
            on=['ticker','date'], how='left')
    
    print(f"\n=== Final DIVOP Dataset ===")
    print(f"Shape: {base.shape}")
    print(f"Tickers: {base['ticker'].nunique()}")
    return base

42 Empirical Applications

42.1 Application 1: DIVOP and the Cross-Section of Returns

The fundamental test of the Miller hypothesis is whether stocks with higher divergence of opinion earn lower subsequent returns. We implement Fama-MacBeth cross-sectional regressions:

\[ r_{i,t+1:t+h} = \gamma_{0,t} + \gamma_{1,t} \cdot \text{DIVOP}_{i,t} + \gamma_{2,t}' \mathbf{X}_{i,t} + \varepsilon_{i,t} \]

where $\mathbf{X}_{i,t}$ includes controls for market beta, log market capitalization, and log book-to-market ratio. The Miller hypothesis predicts $\bar{\gamma}_1 < 0$.

def fama_macbeth_divop(merged_df, divop_proxy='suv',
                        controls=None, horizon=21):
    """
    Fama-MacBeth cross-sectional regressions.
    Miller predicts gamma_1 < 0; Varian predicts gamma_1 > 0.
    """
    if controls is None:
        controls = ['market_beta', 'log_mktcap']
    
    df = merged_df.copy()
    df = df.sort_values(['ticker', 'date'])
    df['fwd_ret'] = df.groupby('ticker')['ret'].transform(
        lambda x: x.shift(-1).rolling(horizon).sum().shift(-(horizon-1))
    )
    df['log_mktcap'] = np.log(
        df['close'] * df['shares_outstanding'] + 1
    )
    
    reg_vars = ['fwd_ret', divop_proxy] + \
               [c for c in controls if c in df.columns]
    df_reg = df[['ticker','date'] + reg_vars].dropna()
    
    from numpy.linalg import lstsq
    results = []
    for date, cross in df_reg.groupby('date'):
        if len(cross) < 30: continue
        y = cross['fwd_ret'].values
        X_cols = [divop_proxy] + [c for c in controls if c in cross.columns]
        X = np.column_stack([np.ones(len(cross)), cross[X_cols].values])
        try:
            coefs, _, _, _ = lstsq(X, y, rcond=None)
            results.append({
                'date': date, 'intercept': coefs[0],
                f'gamma_{divop_proxy}': coefs[1], 'n': len(cross),
            })
        except Exception: continue
    
    fm = pd.DataFrame(results)
    gc = f'gamma_{divop_proxy}'
    mu = fm[gc].mean()
    se = fm[gc].std() / np.sqrt(len(fm))
    t = mu / se
    
    print(f"\n=== Fama-MacBeth: {divop_proxy} -> "
          f"{horizon}-day fwd returns ===")
    print(f"  Mean gamma: {mu:.6f}, t-stat: {t:.3f}")
    if t < -1.96:   print("  -> Supports Miller (1977)")
    elif t > 1.96:   print("  -> Supports Varian (1985)")
    else:            print("  -> Inconclusive at 5%")
    return fm

42.2 Application 2: DIVOP and Earnings Announcements

Following Berkman et al. (2009), we test whether high-DIVOP stocks experience negative abnormal returns around earnings announcements, as uncertainty resolution reduces the optimism premium.

def divop_earnings_event(merged_df, ea_dates_df,
                          divop_proxy='suv', window=(-1, 3)):
    """
    Sort stocks into DIVOP quintiles pre-EA, compute CAR in window.
    Miller predicts: Q5 (high DIVOP) has lower CAR than Q1 (low DIVOP).
    """
    df = merged_df.copy()
    ea = ea_dates_df.copy()
    
    # Pre-EA DIVOP value (5 days before)
    ea['pre_date'] = ea['ea_date'] - pd.Timedelta(days=5)
    ea = ea.merge(
        df[['ticker','date',divop_proxy]].rename(
            columns={'date':'pre_date'}),
        on=['ticker','pre_date'], how='inner'
    )
    ea['divop_q'] = pd.qcut(
        ea[divop_proxy], 5,
        labels=['Q1 Low','Q2','Q3','Q4','Q5 High'],
        duplicates='drop'
    )
    
    print(f"\n=== EA Event Study by {divop_proxy} quintile ===")
    print(f"  Window: ({window[0]}, {window[1]}) days")
    print(f"  Miller predicts: Q5 has lower CAR than Q1")
    return ea

42.3 Application 3: Composite DIVOP Index via PCA

When a single summary measure of disagreement is needed, PCA on the battery of standardized proxies extracts the common “disagreement factor.”

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

def composite_divop_pca(merged_df, proxies=None):
    """Extract first principal component from standardized DIVOP proxies."""
    if proxies is None:
        proxies = ['dto','suv','total_volatility','idio_volatility',
                   'baspread','amihud_daily']
    avail = [p for p in proxies if p in merged_df.columns]
    data = merged_df[['ticker','date'] + avail].dropna()
    
    scaler = StandardScaler()
    X = scaler.fit_transform(data[avail])
    
    pca = PCA(n_components=3)
    factors = pca.fit_transform(X)
    data['divop_composite'] = factors[:, 0]
    
    # Ensure positive correlation with inputs
    for col in avail:
        if data['divop_composite'].corr(data[col]) < 0:
            data['divop_composite'] *= -1
            break
    
    loadings = pd.DataFrame(
        pca.components_.T, index=avail,
        columns=['PC1','PC2','PC3']
    )
    
    print(f"\n=== PCA Composite DIVOP ===")
    print(f"Variance explained: "
          f"{pca.explained_variance_ratio_[:3].round(3)}")
    print(f"\nLoadings:\n{loadings.to_string(float_format='{:.4f}'.format)}")
    return data[['ticker','date','divop_composite']], loadings

43 Conclusion and Practical Recommendations

This chapter has provided a comprehensive methodology for constructing seven distinct proxies for divergence of investor opinion adapted to the Vietnamese equity market. We conclude with practical recommendations:

1. Prefer multiple proxies. No single DIVOP measure is without limitations. We recommend constructing and reporting results for at least three proxies spanning different economic channels (volume, volatility, spreads or analyst-based).

2. Account for Vietnam-specific microstructure. Daily price limits, T+2 settlement, foreign ownership constraints, and the order-driven market structure all affect DIVOP properties. Flag limit-hit days, include exchange fixed effects, and control for foreign ownership.

3. Vietnam as a natural laboratory for Miller (1977). The absence of short selling through 2024 and the dominance of retail investors create conditions that closely match Miller’s theoretical setting. The introduction of short selling in 2025 creates a natural experiment for examining how relaxation of short-sale constraints affects the DIVOP-return relation.

4. Control for analyst coverage when using DISP measures. With typical coverage of 5–10 analysts per firm, forecast dispersion is estimated with greater noise than in developed markets. Always include the number of analysts as a control variable and conduct robustness checks with varying minimum-analyst thresholds.

5. Consider constructing a composite index. When researchers need a single summary measure of disagreement, the PCA-based composite index described in Chapter 42 provides a principled approach to aggregating information across the individual proxies. The first principal component typically explains 30-50% of the common variation in the battery of DIVOP measures.

6. Winsorize aggressively. Several DIVOP proxies (particularly DISP1, Amihud ILLIQ, and SUV) exhibit extreme outliers in the Vietnamese data. Winsorization at the 1st and 99th percentiles (or even 2nd and 98th for DISP1) is essential for obtaining reliable regression results.

7. Be cautious about causal inference. DIVOP proxies are endogenous, they respond to the same firm characteristics (size, leverage, growth) that also affect returns. Researchers should use appropriate controls, consider instrumental variables where feasible, and be explicit about the limitations of their identification strategy.

The DIVOP framework is particularly relevant for the Vietnamese market at this point in its development. As the market matures toward potential FTSE Emerging Market reclassification, as short selling becomes more widely available, and as institutional investor participation grows, the dynamics of opinion divergence and its pricing implications are likely to evolve significantly. The methodology presented in this chapter provides researchers with the tools to document and analyze these changes as they unfold.

Abarbanell, Jeffery S, William N Lanen, and Robert E Verrecchia. 1995. “Analysts’ Forecasts as Proxies for Investor Beliefs in Empirical Research.” Journal of Accounting and Economics 20 (1): 31–60.

Alexandridis, George, Antonios Antoniou, and Dimitris Petmezas. 2007. “Divergence of Opinion and Post-Acquisition Performance.” Journal of Business Finance & Accounting 34 (3-4): 439–60.

Amihud, Yakov. 2002. “Illiquidity and Stock Returns: Cross-Section and Time-Series Effects.” Journal of Financial Markets 5 (1): 31–56.

Anderson, Anne-Marie, and Edward A Dyl. 2005. “Market Structure and Trading Volume.” Journal of Financial Research 28 (1): 115–31.

Anderson, Kirsten L, Jeffrey H Harris, and Eric C So. 2007. “Opinion Divergence and Post-Earnings Announcement Drift.” Available at SSRN 969736.

Berkman, Henk, Valentin Dimitrov, Prem C Jain, Paul D Koch, and Sheri Tice. 2009. “Sell on the News: Differences of Opinion, Short-Sales Constraints, and Returns Around Earnings Announcements.” Journal of Financial Economics 92 (3): 376–99.

Boehme, Rodney D, Bartley R Danielsen, and Sorin M Sorescu. 2006. “Short-Sale Constraints, Differences of Opinion, and Overvaluation.” Journal of Financial and Quantitative Analysis 41 (2): 455–87.

Chatterjee, Sris, Kose John, and An Yan. 2012. “Takeovers and Divergence of Investor Opinion.” The Review of Financial Studies 25 (1): 227–77.

Cheong, Foong Soon, and Jacob Thomas. 2011. “Why Do EPS Forecast Error and Dispersion Not Vary with Scale? Implications for Analyst and Managerial Behavior.” Journal of Accounting Research 49 (2): 359–401.

Chung, Kee H, and Hao Zhang. 2014. “A Simple Approximation of Intraday Spreads Using Daily Data.” Journal of Financial Markets 17: 94–120.

Diether, Karl B, Christopher J Malloy, and Anna Scherbina. 2002. “Differences of Opinion and the Cross Section of Stock Returns.” The Journal of Finance 57 (5): 2113–41.

Dimson, Elroy. 1979. “Risk Measurement When Shares Are Subject to Infrequent Trading.” Journal of Financial Economics 7 (2): 197–226.

Doukas, John A, Chansog Francis Kim, and Christos Pantzalis. 2006. “Divergence of Opinion and Equity Returns.” Journal of Financial and Quantitative Analysis 41 (3): 573–606.

Doukas, John A, Chansog Kim, and Christos Pantzalis. 2004. “Divergent Opinions and the Performance of Value Stocks.” Financial Analysts Journal 60 (6): 55–64.

Fama, Eugene F., and Kenneth R. French. 1993. “Common risk factors in the returns on stocks and bonds.” Journal of Financial Economics 33 (1): 3–56. https://doi.org/10.1016/0304-405X(93)90023-5.

Garfinkel, Jon A. 2009. “Measuring Investors’ Opinion Divergence.” Journal of Accounting Research 47 (5): 1317–48.

Garfinkel, Jon A, and Jonathan Sokobin. 2006. “Volume, Opinion Divergence, and Returns: A Study of Post–Earnings Announcement Drift.” Journal of Accounting Research 44 (1): 85–112.

Handa, Puneet, Robert Schwartz, and Ashish Tiwari. 2003. “Quote Setting and Price Formation in an Order Driven Market.” Journal of Financial Markets 6 (4): 461–89.

Harris, Milton, and Artur Raviv. 1993. “Differences of Opinion Make a Horse Race.” The Review of Financial Studies 6 (3): 473–506.

Hong, Harrison, and Jeremy C Stein. 2003. “Differences of Opinion, Short-Sales Constraints, and Market Crashes.” The Review of Financial Studies 16 (2): 487–525.

Houge, Todd, Tim Loughran, Gerry Suchanek, and Xuemin Yan. 2001. “Divergence of Opinion, Uncertainty, and the Quality of Initial Public Offerings.” Financial Management, 5–23.

Kandel, Eugene, and Neil D Pearson. 1995. “Differential Interpretation of Public Signals and Trade in Speculative Markets.” Journal of Political Economy 103 (4): 831–72.

Milgrom, Paul, and Nancy Stokey. 1982. “Information, Trade and Common Knowledge.” Journal of Economic Theory 26 (1): 17–27.

Miller, Edward M. 1977. “Risk, Uncertainty, and Divergence of Opinion.” The Journal of Finance 32 (4): 1151–68.

Nguyen, Du D, and Minh C Pham. 2018. “Search-Based Sentiment and Stock Market Reactions: An Empirical Evidence in Vietnam.” The Journal of Asian Finance, Economics and Business 5 (4): 45–56.

Phan, Thi Nha Truc, Philippe Bertrand, Hong Hai Phan, and Xuan Vinh Vo. 2023. “The Role of Investor Behavior in Emerging Stock Markets: Evidence from Vietnam.” The Quarterly Review of Economics and Finance 87: 367–76.

Scheinkman, Jose A, and Wei Xiong. 2003. “Overconfidence and Speculative Bubbles.” Journal of Political Economy 111 (6): 1183–1220.

Shalen, Catherine T. 1993. “Volume, Volatility, and the Dispersion of Beliefs.” The Review of Financial Studies 6 (2): 405–34.

Varian, Hal R. 1985. “Divergence of Opinion in Complete Markets: A Note.” The Journal of Finance 40 (1): 309–17.

Vo, Xuan Vinh. 2015. “Foreign Ownership and Stock Return Volatility–Evidence from Vietnam.” Journal of Multinational Financial Management 30: 101–9.

———. 2017. “Do Foreign Investors Improve Stock Price Informativeness in Emerging Equity Markets? Evidence from Vietnam.” Research in International Business and Finance 42: 986–91.

Vo, Xuan Vinh, and Dang Bao Anh Phan. 2017. “Further Evidence on the Herd Behavior in Vietnam Stock Market.” Journal of Behavioral and Experimental Finance 13: 33–41.

# Measuring Divergence of Investor Opinion A foundational question in financial economics concerns how differences in investor beliefs affect asset prices and trading activity. In markets where investors hold heterogeneous expectations about a firm's future cash flows, the aggregation of these divergent views into a single market price becomes a non-trivial exercise with profound implications for asset valuation, return predictability, and market efficiency. The concept of **divergence of investor opinion** (hereafter DIVOP) has emerged as a central construct in both the accounting and finance literatures, serving as a lens through which researchers examine the information environment of firms, the dynamics of uncertainty resolution, and the nature of market reactions to news. The theoretical foundations of the DIVOP literature trace back to @miller1977risk, who proposed that when investors disagree about the value of a security and short-sale constraints prevent pessimistic investors from fully expressing their views, the market price will reflect the valuation of the most optimistic investors. This leads to systematic overpricing that is increasing in the degree of opinion divergence. The overpricing persists until information events, such as earnings announcements, reduce disagreement and prices converge toward fundamental values [@berkman2009sell]. @varian1985divergence offers an alternative perspective in which divergence of opinion represents an additional risk factor, leading to *higher* rather than lower expected returns, creating a theoretical tension that has motivated extensive empirical investigation. The empirical literature on DIVOP has expanded considerably since these seminal contributions. Researchers have documented that divergence of opinion helps explain a range of asset pricing anomalies, including post-earnings announcement drift [@garfinkel2006volume; @anderson2007opinion], the cross-sectional return difference between value and growth stocks [@doukas2004divergent], short- and long-run post-IPO returns [@houge2001divergence], pre- and post-acquisition stock returns [@alexandridis2007divergence], takeover premia [@chatterjee2012takeovers], and the broad cross-section of stock returns [@diether2002differences; @doukas2006divergence]. The explanatory power of DIVOP has been demonstrated using a rich set of empirical proxies, ranging from analyst forecast dispersion and abnormal trading volume to bid-ask spreads and idiosyncratic volatility. Despite the maturity of the DIVOP literature in developed markets, particularly the United States, its application to emerging markets remains remarkably thin. This gap is especially notable given that the theoretical conditions under which divergence of opinion matters most (namely, binding short-sale constraints, information asymmetry, and heterogeneous investor sophistication) are arguably *more* prevalent in emerging markets than in their developed counterparts. The Vietnamese equity market presents a compelling laboratory for studying investor disagreement. The market is characterized by several features that amplify the relevance of the DIVOP framework: 1. **Binding short-sale constraints.** Short selling was not permitted in Vietnam until January 2025, and even after its introduction, the mechanism remains restricted to a limited set of securities with significant regulatory constraints on execution. This closely mirrors the theoretical setting of @miller1977risk, where pessimistic investors are unable to fully express their views through short positions. 2. **Dominance of retail investors.** Individual investors account for approximately 80-85% of daily trading volume on HOSE and HNX, compared to roughly 25% in the United States. Retail investors are more susceptible to behavioral biases, sentiment-driven trading, and information processing limitations that naturally give rise to heterogeneous beliefs [@phan2023role]. 3. **Information asymmetry and transparency challenges.** Despite improvements in disclosure standards, Vietnam's regulatory framework for corporate reporting remains less stringent than those in developed markets. Selective disclosure, delayed filing of financial statements, and limited enforcement of insider trading regulations create an environment in which investors operate with substantially different information sets [@vo2017further]. 4. **Foreign ownership limits.** Caps on foreign ownership (currently 49% for most sectors, with exceptions) create a segmented market where domestic and foreign investors may hold systematically different views about firm value, amplifying the divergence of opinion. 5. **Thin analyst coverage.** Whereas a typical S&P 500 firm is followed by 15-25 sell-side analysts, coverage of Vietnamese equities is concentrated among a relatively small number of domestic brokerages and a handful of international research houses. This limits the informativeness of traditional analyst-based DIVOP measures and necessitates greater reliance on market-based proxies. This chapter provides a methodology for constructing multiple proxies for divergence of investor opinion adapted to the institutional characteristics of the Vietnamese market. We draw on the methodological frameworks established by @garfinkel2009measuring and @diether2002differences, while introducing modifications that account for the microstructure of Vietnamese exchanges, the $T+2$ settlement cycle, the absence (until recently) of short selling, and the availability of data through domestic financial platforms. Specifically, we construct and analyze the following DIVOP proxies: - **Unexplained Volume (DTO):** Market-adjusted turnover detrended by its rolling median, capturing abnormal trading activity attributable to disagreement after controlling for liquidity and market-wide effects. - **Standardized Unexplained Volume (SUV):** A regression-based measure that explicitly controls for the informedness and liquidity components of volume by modeling turnover as a function of signed returns. - **Stock Return Volatility (VOLATILITY):** The standard deviation of daily returns over a rolling estimation window, serving as a proxy for the dispersion of investor valuations. - **Bid-Ask Spread (BASPREAD):** The proportional quoted spread, reflecting the adverse selection component associated with heterogeneous information among market participants. - **Analyst Forecast Dispersion (DISP):** The cross-sectional standard deviation of individual analyst earnings forecasts, directly measuring disagreement among informed market participants. - **Idiosyncratic Volatility (IVOL):** The residual volatility from a factor model regression, isolating the firm-specific component of return variation that reflects divergent investor interpretations of firm-level information. - **Amihud Illiquidity (ILLIQ):** The price impact ratio proposed by @amihud2002illiquidity, which captures the information asymmetry dimension of disagreement through the price response to order flow. For each proxy, we describe the theoretical motivation, the data requirements, the construction methodology adapted for Vietnamese data, the empirical properties observed in the Vietnamese cross-section, and the practical considerations that researchers should bear in mind when employing these measures. We pay particular attention to issues that are specific to emerging markets, including thin trading, corporate action adjustments, exchange-specific microstructure effects, and the interplay between foreign ownership constraints and measures of investor disagreement. ------------------------------------------------------------------------ # Theoretical Framework {#sec-theoretical-framework} ## The Miller (1977) Overpricing Hypothesis The canonical model of divergence of opinion and asset pricing begins with @miller1977risk. Miller's central insight is simple: in a market where investors hold heterogeneous beliefs about the future payoffs of a risky asset and short-sale constraints prevent some investors from acting on their pessimistic views, the equilibrium price will be set by the subset of investors who are most optimistic about the asset's value. The severity of overpricing is increasing in both the degree of opinion divergence and the stringency of short-sale constraints. Formally, if investor $i$ assigns a valuation $V_i$ to a security, the market price $P$ satisfies: $$ P = E[V_i \mid V_i \geq V^*] $$ where $V^*$ is the marginal investor's valuation, which exceeds the unconditional mean valuation $E[V_i]$ whenever short-sale constraints bind for some investors. The degree of overpricing is: $$ \text{Overpricing} = P - E[V_i] = E[V_i \mid V_i \geq V^*] - E[V_i] $$ which is positive and increasing in the dispersion of the distribution of $V_i$ (i.e., divergence of opinion) and in $V^*$ (i.e., the severity of short-sale constraints). Miller's model generates several testable predictions: - **Cross-sectional prediction:** Stocks with **higher divergence of opinion should have *lower* subsequent returns** as prices gradually correct toward fundamental values. - **Time-series prediction:** Information events that reduce disagreement (e.g., earnings announcements) should be associated with negative abnormal returns for high-DIVOP stocks, as the "optimism premium" dissipates. - **Interaction prediction:** The overpricing effect should be strongest among stocks that simultaneously exhibit high divergence of opinion *and* binding short-sale constraints. ## Alternative Theoretical Perspectives @varian1985divergence proposes an alternative framework in which divergence of opinion acts as a risk factor. If investors are risk-averse and disagreement represents genuine uncertainty about future payoffs, then **higher dispersion of beliefs should be associated with *higher* expected returns** as compensation for bearing the additional risk. This creates a sharp empirical dichotomy: the Miller hypothesis predicts a negative DIVOP-return relation, whereas the Varian model predicts a positive relation. The distinction between these theories hinges critically on the market microstructure and institutional setting (\@tbl-divop-theories). | Theoretical Framework | Short-Sale Constraints | DIVOP-Return Relation | Key Mechanism | |:-----------------|:-----------------|:-----------------|:-----------------| | @miller1977risk | Binding | Negative | Optimistic bias in price | | @varian1985divergence | Non-binding | Positive | Risk premium for uncertainty | | @hong2003differences | Binding, gradual info | Negative, time-varying | Slow diffusion of bearish views | | @scheinkman2003overconfidence | Binding, overconfidence | Negative | Speculative bubble premium | : Summary of theoretical predictions for the DIVOP-return relation under different assumptions {#tbl-divop-theories} @hong2003differences extend Miller's framework by incorporating gradual information diffusion. In their model, bearish information is impounded into prices more slowly than bullish information because short-sale constraints raise the cost of acting on negative views. This generates momentum-like patterns in which high-DIVOP stocks exhibit positive short-run returns (as optimists push prices up) followed by negative long-run returns (as bearish information eventually reaches the market). @scheinkman2003overconfidence introduce an additional dimension by noting that when investors are overconfident about their private signals *and* short-sale constraints bind, stock prices contain a "speculative bubble" component that reflects the option value of reselling the asset to a future investor who may be even more optimistic. This model predicts that both high trading volume and high price volatility should be associated with overpricing, providing a theoretical basis for using volume-based and volatility-based DIVOP proxies. ## Relevance to the Vietnamese Market The Vietnamese equity market provides an unusually clean setting for testing the Miller hypothesis. Vietnam's equity market operated without any short-selling mechanism from its inception in 2000 through January 2025, which was a full quarter-century in which the first necessary condition of Miller's model (binding short-sale constraints) was satisfied by regulation rather than by market frictions. Even after the introduction of covered short selling in 2025, the mechanism remains restricted to securities meeting specific liquidity and market capitalization thresholds, and the regulatory environment imposes borrowing requirements that significantly raise the cost of shorting relative to developed markets. The dominance of retail investors amplifies the second necessary condition (i.e., heterogeneous beliefs). Research on the Vietnamese market has documented significant herding behavior [@vo2017further; @vo2015foreign], sentiment-driven trading [@phan2023role; @nguyen2018search], and information asymmetry between domestic and foreign investors [@vo2017foreign]. These behavioral characteristics naturally generate wider dispersion of investor valuations compared to markets dominated by institutional investors with access to similar analytical frameworks and information sources. @tbl-divop-vietnam-us compares key institutional features relevant to the DIVOP framework between Vietnam and the United States. | Feature | Vietnam (HOSE/HNX) | United States (NYSE/NASDAQ) | |:-----------------------|:-----------------------|:-----------------------| | Short selling | Introduced Jan 2025 (limited) | Permitted (Reg SHO since 2005) | | Retail investor share of volume | \~80-85% | \~25% | | Settlement cycle | T+2 (T+1 planned for 2026) | T+1 (since May 2024) | | Daily price limits | $\pm$ 7% (HOSE), $\pm$ 10% (HNX) | None | | Foreign ownership cap | 49% (most sectors) | None | | Average analyst coverage (VN30) | 5-10 analysts | 15-25 analysts | | Mandatory quarterly reporting | Yes (since 2012) | Yes | | Options/derivatives market | VN30 Index Futures (since 2017) | Extensive options/futures | : Institutional comparison of Vietnam and the United States relevant to divergence of opinion {#tbl-divop-vietnam-us} The presence of daily price limits ($\pm$ 7% on HOSE and $\pm$ 10% on HNX) creates an additional mechanism through which divergence of opinion can be amplified. When a stock hits its price limit, investors who wish to trade in the direction of the limit are unable to do so, leading to accumulated unfilled orders and delayed price discovery. This institutional feature may create short-term spikes in measured DIVOP that reflect limit-induced friction rather than genuine disagreement. We address this issue in our empirical methodology by flagging limit-hit days and conducting robustness checks that exclude these observations. # Data Sources and Sample Construction {#sec-data} ## Data Sources The construction of DIVOP proxies for the Vietnamese market requires daily stock-level trading data and, for the analyst dispersion measures, individual analyst forecast data. We source all data from [DataCore.vn](https://datacore.vn/en), which provides coverage of all securities listed on HOSE, HNX, and the UPCoM (Unlisted Public Company Market) exchange. @tbl-divop-data-sources summarizes the datasets and key variables used in this study. | Dataset | Key Variables | Frequency | |:-----------------------|:-----------------------|:-----------------------| | Daily Stock Trading | Close price, high, low, open, volume, shares outstanding, adjusted price, bid, ask | Daily | | Corporate Actions | Dividends, stock splits, bonus issues, rights offerings | Event-based | | Company Information | Exchange code, industry classification (ICB), listing date, delisting date | Static/Periodic | | Analyst Forecasts | Individual analyst EPS forecasts, announcement dates, fiscal period end, analyst ID, broker name | Per estimate | | Market Index | VN-Index daily returns, VN30 returns, HNX-Index returns | Daily | | Foreign Ownership | Foreign buy/sell volume, foreign ownership percentage, remaining foreign room | Daily | : Data sources and key variables for DIVOP proxy construction {#tbl-divop-data-sources} ## Sample Construction We construct our sample using the following filters, applied sequentially: ```{python} #| label: sample-construction #| code-summary: "Sample construction and initial data loading" import pandas as pd import numpy as np from datetime import datetime, timedelta from sklearn.linear_model import LinearRegression from scipy import stats as scipy_stats import matplotlib.pyplot as plt import matplotlib.dates as mdates import seaborn as sns import warnings warnings.filterwarnings('ignore') # ============================================================================= # Configuration Parameters # ============================================================================= # Users can modify these parameters to adjust the methodology CONFIG = { # Sample period 'beg_date': '2007-01-01', 'end_date': '2024-12-31', # Estimation windows (in trading days) 'est_window': 60, # Rolling window for SUV and volatility 'detrend_window': 180, # Window for DTO detrending median 'lag': 7, # Lag for DTO detrending 'gap': 5, # Gap between estimation period and event date # Filters 'min_price': 1000, # Minimum price in VND 'min_volume_days': 0.8, # Min fraction of non-zero volume days in window 'min_analysts': 3, # Minimum number of analysts for DISP 'max_spread_pct': 0.50, # Maximum bid-ask spread as fraction of midpoint 'forecast_carry_days': 105,# Days to carry forward stale analyst forecasts # Exchange identifiers 'exchanges': ['HOSE', 'HNX'], # Price limit thresholds (for flagging) 'price_limit_hose': 0.07, 'price_limit_hnx': 0.10, } print("Configuration parameters loaded successfully.") print(f"Sample period: {CONFIG['beg_date']} to {CONFIG['end_date']}") print(f"Estimation window: {CONFIG['est_window']} trading days") print(f"Detrending window: {CONFIG['detrend_window']} trading days") ``` The sample universe includes all common stocks (ordinary shares) listed on HOSE and HNX during the period January 2007 through December 2024. We begin in 2007 rather than at market inception (2000 for HOSE, 2005 for HNX) for two reasons. First, the early years of the Vietnamese market were characterized by an extremely small number of listed firms (fewer than 30 on HOSE through 2005), making cross-sectional analysis unreliable. Second, data quality and consistency improve substantially after the market expansion of 2006-2007, during which the number of listed firms on HOSE grew from approximately 40 to over 100. We apply the following filters to construct the analysis sample: 1. **Security type filter.** We retain only common stocks (ordinary shares), excluding preferred shares, exchange-traded funds (ETFs), covered warrants, and certificates of deposit. This is analogous to the standard filter in the U.S. literature that restricts to CRSP share codes 10 and 11. 2. **Exchange filter.** We include stocks listed on HOSE and HNX but exclude UPCoM securities in our baseline analysis. UPCoM is a registration-based trading venue with less stringent listing requirements and substantially lower liquidity, which may introduce noise into volume-based and spread-based measures. We include UPCoM in robustness checks. 3. **Price filter.** We exclude stock-day observations with closing prices below 1,000 VND. This threshold serves the same purpose as the "penny stock" exclusion common in U.S. studies (typically \$1 or \$5 thresholds) and helps mitigate the influence of extreme percentage returns and spreads at very low price levels. 4. **Minimum trading activity.** For volume-based measures, we require that a stock has non-zero trading volume on at least 80% of trading days within each estimation window. This filter eliminates the most thinly traded securities for which turnover-based measures would be unreliable. ```{python} #| label: load-and-filter #| code-summary: "Load and filter daily stock data" def load_daily_data(config): """ Load daily stock trading data from DataCore.vn. In practice, this function connects to the DataCore API or reads from a local database/CSV. Here we document the expected schema. Expected columns: - ticker: str, stock ticker symbol (e.g., 'VCB', 'HPG', 'VNM') - date: datetime, trading date - open, high, low, close: float, daily OHLC prices (VND) - volume: int, trading volume (shares) - shares_outstanding: int, total shares outstanding - adjusted_close: float, price adjusted for corporate actions - adj_factor: float, cumulative adjustment factor - bid, ask: float, best bid/ask at close - exchange: str, exchange code ('HOSE', 'HNX', 'UPCOM') - industry_icb: str, ICB industry classification code - foreign_buy_vol, foreign_sell_vol: int, foreign investor volumes - foreign_ownership_pct: float, foreign ownership percentage """ # ========================================================================= # Replace with actual DataCore API call: # from datacore import Client # client = Client(api_key='YOUR_KEY') # df = client.daily_stock( # start=config['beg_date'], end=config['end_date'], # exchanges=config['exchanges'] # ) # ========================================================================= print("Connect to DataCore.vn and load daily stock data.") print("Expected schema: ticker, date, open, high, low, close, volume,") print(" shares_outstanding, adjusted_close, adj_factor, bid, ask,") print(" exchange, industry_icb, foreign_buy_vol, foreign_sell_vol,") print(" foreign_ownership_pct") return None # Replace with actual data def apply_sample_filters(df, config): """Apply sequential sample construction filters.""" print("\n=== Sample Construction ===") n0 = len(df) # Date filter df = df[(df['date'] >= config['beg_date']) & (df['date'] <= config['end_date'])].copy() print(f"[1] Date filter: {len(df):,} obs (from {n0:,})") # Exchange filter df = df[df['exchange'].isin(config['exchanges'])].copy() print(f"[2] Exchange filter ({config['exchanges']}): {len(df):,} obs") # Price filter df = df[df['close'] >= config['min_price']].copy() print(f"[3] Price >= {config['min_price']:,} VND: {len(df):,} obs") # Compute daily return from adjusted prices df = df.sort_values(['ticker', 'date']) df['ret'] = df.groupby('ticker')['adjusted_close'].pct_change() # Flag price limit hits df['limit_hit'] = ( ((df['exchange'] == 'HOSE') & (df['ret'].abs() >= config['price_limit_hose'] - 0.001)) | ((df['exchange'] == 'HNX') & (df['ret'].abs() >= config['price_limit_hnx'] - 0.001)) ) n_tickers = df['ticker'].nunique() print(f"\nFinal sample: {len(df):,} stock-day obs, " f"{n_tickers} unique tickers") print(f"Limit-hit days: {df['limit_hit'].sum():,} " f"({100*df['limit_hit'].mean():.2f}%)") return df ``` ## Corporate Action Adjustments {#sec-corp-actions} Proper adjustment for corporate actions is critical for volume-based DIVOP measures, as events such as stock splits, bonus share issues, and rights offerings change the number of shares outstanding and can create artificial spikes in measured turnover. We need to use cumulative adjustment factors that account for stock dividends (bonus shares), stock splits, rights offerings, and cash dividends (price adjustment only). We use these to construct adjusted volume and adjusted shares outstanding: $$ \text{AdjVolume}_{i,t} = \text{Volume}_{i,t} \times \text{CumAdjFactor}_{i,t} $$ $$ \text{AdjSharesOut}_{i,t} = \text{SharesOut}_{i,t} \times \text{CumAdjFactor}_{i,t} $$ This ensures that the turnover ratio is consistent across corporate action events. ```{python} #| label: corp-action #| code-summary: "Corporate action adjustment" def adjust_for_corporate_actions(df): """Apply cumulative adjustment factors to volume and shares outstanding.""" df = df.copy() df['adj_volume'] = df['volume'] * df['adj_factor'] df['adj_shares_out'] = df['shares_outstanding'] * df['adj_factor'] # Daily turnover ratio df['turnover'] = np.where( df['adj_shares_out'] > 0, df['adj_volume'] / df['adj_shares_out'], np.nan ) # Flag extreme turnover (> 50% of float) extreme = df['turnover'] > 0.50 if extreme.any(): print(f"Warning: {extreme.sum()} obs with turnover > 50%, set to NaN") df.loc[extreme, 'turnover'] = np.nan return df ``` ## Trading Calendar Construction {#sec-calendar} The rolling regression approach for SUV and volatility requires a trading calendar that ensures each estimation window contains exactly the specified number of trading days. We construct this directly from observed trading dates. ```{python} #| label: trading-calendar #| code-summary: "Build trading calendar for rolling estimation windows" def build_trading_calendar(df, config): """ Map each trading date to its estimation window [est_start, est_end]. For date t, the estimation window runs from t - gap - est_window to t - gap - 1 (in trading-day terms). """ trading_dates = sorted(df['date'].unique()) trading_dates = pd.Series(trading_dates) est_window = config['est_window'] gap = config['gap'] offset = est_window + gap records = [] for i in range(offset, len(trading_dates)): records.append({ 'date': trading_dates.iloc[i], 'est_start': trading_dates.iloc[i - gap - est_window], 'est_end': trading_dates.iloc[i - gap - 1] }) calendar = pd.DataFrame(records) print(f"Trading calendar: {len(calendar)} dates, " f"{calendar['date'].min()} to {calendar['date'].max()}") return calendar ``` # Volume-Based DIVOP Proxies {#sec-volume-based} ## Theoretical Motivation Trading volume has long been recognized as a natural proxy for divergence of investor opinion. In the rational expectations framework of @milgrom1982information, trade occurs only when investors disagree about the value of a security (i.e., a "no-trade theorem" that implies, by contrapositive, that observed trading volume must reflect some form of heterogeneous beliefs). @harris1993differences and @kandel1995differential formalize this intuition, showing that trading volume is positively related to the dispersion of investors' prior beliefs and to the degree to which public information is differentially interpreted. The challenge in using raw trading volume as a DIVOP proxy is that volume is also driven by factors unrelated to disagreement, including portfolio rebalancing, liquidity needs, tax-loss selling, and index reconstitution effects. @garfinkel2009measuring proposes two approaches to extract the disagreement component from raw volume. The first, **Unexplained Volume (DTO)**, removes market-wide volume effects and secular trends. The second, **Standardized Unexplained Volume (SUV)**, additionally controls for the information content of returns through a cross-sectional regression, isolating the "pure disagreement" component of trading activity. ## Unexplained Volume (DTO) {#sec-dto} ### Construction Methodology The construction of the Unexplained Volume measure proceeds in four steps. **Step 1: Compute firm-level daily turnover.** For each stock $i$ on day $t$: $$ \text{Turn}_{i,t} = \frac{\text{AdjVolume}_{i,t}}{\text{AdjSharesOut}_{i,t}} $$ **Step 2: Compute market-wide turnover.** We calculate aggregate turnover across all common stocks as a value-weighted average: $$ \text{MktTurn}_{t} = \frac{\sum_{i} \text{AdjVolume}_{i,t}}{\sum_{i} \text{AdjSharesOut}_{i,t}} $$ Unlike the U.S. methodology that computes market turnover across NYSE/AMEX stocks only and applies a scaling adjustment for NASDAQ securities [following @anderson2005market], we compute market turnover across all HOSE and HNX common stocks without any exchange-specific volume scaling. Both Vietnamese exchanges operate as order-driven markets (HOSE uses continuous order matching; HNX uses a combination of continuous matching and periodic call auctions) without the dealer-market double-counting issue that necessitates the NASDAQ volume adjustment in U.S. studies. **Step 3: Compute market-adjusted turnover.** $$ \text{MATO}_{i,t} = \text{Turn}_{i,t} - \text{MktTurn}_{t} $$ **Step 4: Detrend by rolling median.** To remove secular trends in firm-specific trading activity: $$ \text{DTO}_{i,t} = \text{MATO}_{i,t} - \text{Median}_{180}(\text{MATO}_{i,t-7}) $$ where $\text{Median}_{180}(\text{MATO}_{i,t-7})$ is the median of market-adjusted turnover over the 180-trading-day window ending 7 days before date $t$. The 7-day lag prevents the current day's turnover from influencing its own detrending baseline. ```{python} #| label: dto-construction #| code-summary: "Construct the Unexplained Volume (DTO) measure" def compute_market_turnover(df): """Compute daily market-wide turnover across all stocks.""" mkt_turn = df.groupby('date').apply( lambda x: x['adj_volume'].sum() / x['adj_shares_out'].sum() if x['adj_shares_out'].sum() > 0 else np.nan ).reset_index() mkt_turn.columns = ['date', 'market_turnover'] return mkt_turn def compute_dto(df, config): """ Construct Unexplained Volume (DTO). Steps: 1. Subtract market turnover -> MATO 2. Rolling 180-day median of MATO (lagged 7 days) -> trend 3. DTO = MATO - trend """ detrend_window = config['detrend_window'] lag = config['lag'] # Market turnover mkt_turn = compute_market_turnover(df) df = df.merge(mkt_turn, on='date', how='left') # Market-adjusted turnover df['mato'] = df['turnover'] - df['market_turnover'] # Rolling median with lag, computed per stock df = df.sort_values(['ticker', 'date']) def _rolling_median_lagged(group): mato = group['mato'] med = mato.rolling( window=detrend_window, min_periods=int(detrend_window * 0.5) ).median() return med.shift(lag) df['mato_trend'] = ( df.groupby('ticker', group_keys=False) .apply(lambda g: _rolling_median_lagged(g)) ) # DTO df['dto'] = df['mato'] - df['mato_trend'] print("DTO construction complete.") print(f" Non-missing: {df['dto'].notna().sum():,}") print(f" Mean: {df['dto'].mean():.6f}, Std: {df['dto'].std():.6f}") return df ``` ### Vietnam-Specific Considerations for DTO Several features of the Vietnamese market require attention when constructing DTO: 1. **No NASDAQ-type volume adjustment needed.** Both HOSE and HNX are order-driven auction markets. The double-counting adjustment applied to NASDAQ securities in the U.S. literature is not necessary. 2. **Thinly traded stocks.** A substantial fraction of listed Vietnamese stocks, particularly on HNX, may have zero volume on many trading days. For stocks with intermittent trading, the rolling median may be biased toward zero, making DTO less informative. We require at least 80% non-zero volume days in each estimation window. 3. **Price limit effects on volume.** When a stock hits its daily price limit, unfilled orders accumulate and recorded volume may understate true clearing volume. The following day often shows a "catch-up" effect. Researchers should consider flagging limit-hit days. 4. **Foreign investor trading decomposition.** DataCore provides volume by investor type (foreign versus domestic). Researchers may wish to construct separate DTO measures for foreign and domestic volume, or use the foreign-to-domestic volume ratio as an additional dimension of disagreement. ## Standardized Unexplained Volume (SUV) {#sec-suv} ### Construction Methodology The Standardized Unexplained Volume measure, proposed by @garfinkel2009measuring, isolates the disagreement component of volume by explicitly controlling for the information content of returns. The insight is that trading volume has both a **liquidity** component and an **informedness** component correlated with the magnitude and sign of returns. By regressing turnover on signed returns and extracting the standardized residual, SUV captures volume attributable to disagreement after controlling for both liquidity trends and information-driven trading. For each stock $i$, on each trading date $t$, we estimate using data from the estimation window $[\tau_1, \tau_2]$: $$ \text{Turn}_{i,s} = \alpha_i + \beta_i^{+} \cdot \text{RetPos}_{i,s} + \beta_i^{-} \cdot \text{RetNeg}_{i,s} + \epsilon_{i,s}, \quad s \in [\tau_1, \tau_2] $$ {#eq-suv-regression} where $\text{RetPos}_{i,s} = |r_{i,s}| \cdot \mathbf{1}(r_{i,s} > 0)$ and $\text{RetNeg}_{i,s} = |r_{i,s}| \cdot \mathbf{1}(r_{i,s} < 0)$. The Standardized Unexplained Volume on date $t$ is: $$ \text{SUV}_{i,t} = \frac{\text{Turn}_{i,t} - \hat{\text{Turn}}_{i,t}}{\hat{\sigma}_{\epsilon,i}} $$ {#eq-suv} where $\hat{\text{Turn}}_{i,t}$ is the predicted turnover and $\hat{\sigma}_{\epsilon,i}$ is the RMSE from @eq-suv-regression. The asymmetric specification with separate coefficients for positive and negative returns reflects that the volume-return relation differs by return sign. In the U.S., buying pressure tends to generate more volume than selling pressure due to short-sale frictions. In Vietnam, where short selling was unavailable until 2025, this asymmetry should be even more pronounced because all selling activity was constrained to existing shareholders. ```{python} #| label: suv-construction #| code-summary: "Construct Standardized Unexplained Volume (SUV)" def compute_suv(df, calendar, config): """ Compute Standardized Unexplained Volume via rolling regressions. For each stock-date, regress Turn on RetPos and RetNeg over the estimation window, then compute SUV = (actual - predicted) / RMSE. """ est_window = config['est_window'] min_obs = int(est_window * config['min_volume_days']) # Prepare signed return components df = df.copy() df['ret_pos'] = np.where(df['ret'] > 0, np.abs(df['ret']), 0.0) df['ret_neg'] = np.where( (df['ret'] < 0) & df['ret'].notna(), np.abs(df['ret']), 0.0 ) results = [] grouped = {t: g for t, g in df.groupby('ticker')} for _, cal_row in calendar.iterrows(): dt = cal_row['date'] est_s, est_e = cal_row['est_start'], cal_row['est_end'] for ticker, tdata in grouped.items(): # Estimation window est = tdata[ (tdata['date'] >= est_s) & (tdata['date'] <= est_e) ].dropna(subset=['turnover', 'ret_pos', 'ret_neg']) if len(est) < min_obs: continue # Event date evt = tdata[tdata['date'] == dt] if evt.empty or evt['turnover'].isna().all(): continue # OLS: Turn = alpha + beta_pos * RetPos + beta_neg * RetNeg X = est[['ret_pos', 'ret_neg']].values y = est['turnover'].values reg = LinearRegression().fit(X, y) y_hat = reg.predict(X) rmse = np.sqrt(np.mean((y - y_hat) ** 2)) if rmse <= 0: continue # Predict and standardize for event date X_evt = evt[['ret_pos', 'ret_neg']].values pred = reg.predict(X_evt)[0] actual = evt['turnover'].values[0] suv = (actual - pred) / rmse results.append({ 'ticker': ticker, 'date': dt, 'suv': suv, 'predicted_turnover': pred, 'rmse_turn': rmse, 'n_est': len(est), 'alpha_turn': reg.intercept_, 'beta_pos': reg.coef_[0], 'beta_neg': reg.coef_[1], }) suv_df = pd.DataFrame(results) print(f"SUV: {len(suv_df):,} stock-date obs") print(f" Mean: {suv_df['suv'].mean():.4f}, " f"Median: {suv_df['suv'].median():.4f}") return suv_df ``` ### Interpreting the SUV Regression Coefficients The estimated coefficients from @eq-suv-regression are informative about market microstructure. @garfinkel2009measuring reports $\hat{\beta}^{+} > \hat{\beta}^{-}$ for most U.S. stocks. In Vietnam, we expect this asymmetry to be even stronger because: - **No short selling (pre-2025):** All selling is by existing shareholders, limiting volume response to negative returns. - **T+2 settlement:** Investors cannot immediately reinvest sale proceeds, further dampening sell-side volume. - **Price limits:** The $\pm$ 7% (HOSE) and $\pm$ 10% (HNX) daily limits truncate the return distribution, compressing the range of both regressors. Researchers should report summary statistics of $(\hat{\alpha}, \hat{\beta}^{+}, \hat{\beta}^{-}, R^2)$ across the cross-section and over time. ```{python} #| label: suv-diagnostics #| code-summary: "Diagnostic statistics for SUV turnover regressions" def suv_diagnostics(suv_df): """Report cross-sectional summary of SUV regression parameters.""" print("\n=== SUV Regression Diagnostics ===") params = ['alpha_turn', 'beta_pos', 'beta_neg'] print(suv_df[params].describe( percentiles=[.05, .25, .50, .75, .95] ).T.to_string(float_format='{:.6f}'.format)) # Asymmetry test diff = suv_df['beta_pos'] - suv_df['beta_neg'] print(f"\nbeta_pos - beta_neg: mean = {diff.mean():.6f}, " f"frac > 0 = {(diff > 0).mean():.3f}") ``` # Volatility-Based DIVOP Proxies {#sec-volatility} ## Total Return Volatility {#sec-total-vol} ### Theoretical Motivation Stock return volatility serves as a proxy for divergence of opinion through several channels. @shalen1993volume develops a model in which both volume and volatility are increasing in the dispersion of investor beliefs. @scheinkman2003overconfidence predict that higher volatility reflects the speculative trading component driven by overconfident investors who disagree about value. Empirically, @boehme2006short and @chatterjee2012takeovers use idiosyncratic volatility as a DIVOP proxy and find it positively correlated with other disagreement measures and negatively associated with subsequent returns when short-sale constraints bind. ### Construction Total return volatility is the standard deviation of daily returns over the rolling estimation window: $$ \text{VOLATILITY}_{i,t} = \sqrt{\frac{1}{N_i - 1} \sum_{s \in [\tau_1, \tau_2]} (r_{i,s} - \bar{r}_i)^2} $$ {#eq-volatility} where $N_i$ is the number of non-missing return observations for stock $i$ in the window $[\tau_1, \tau_2]$. ## Idiosyncratic Volatility (IVOL) {#sec-ivol} Idiosyncratic volatility isolates firm-specific return variation by removing the systematic component explained by market movements. We compute IVOL from the residuals of a market model: $$ r_{i,s} = \alpha_i + \beta_i \cdot r_{m,s} + \epsilon_{i,s}, \quad s \in [\tau_1, \tau_2] $$ {#eq-market-model} $$ \text{IVOL}_{i,t} = \text{Std}(\hat{\epsilon}_{i,s}) $$ {#eq-ivol} Researchers may extend this to a @fama1993common three-factor or five-factor model using Vietnamese factor portfolios constructed elsewhere in this book. A richer factor model yields IVOL estimates that better isolate truly idiosyncratic disagreement, at the cost of requiring factor portfolio construction. ```{python} #| label: volatility-construction #| code-summary: "Construct total and idiosyncratic volatility" def compute_volatility(df, calendar, config): """ Compute total return volatility and idiosyncratic volatility via rolling estimation windows. Total vol = std(returns) in window. IVOL = std(residuals) from market model regression. """ est_window = config['est_window'] min_obs = int(est_window * config['min_volume_days']) # Value-weighted market return def _vw_ret(g): valid = g.dropna(subset=['ret']) if valid.empty: return np.nan w = valid['adj_shares_out'] * valid['close'] return np.average(valid['ret'], weights=w) mkt_ret = df.groupby('date').apply(_vw_ret).reset_index() mkt_ret.columns = ['date', 'mkt_ret'] df = df.merge(mkt_ret, on='date', how='left') results = [] grouped = {t: g for t, g in df.groupby('ticker')} for _, cal_row in calendar.iterrows(): dt = cal_row['date'] est_s, est_e = cal_row['est_start'], cal_row['est_end'] for ticker, tdata in grouped.items(): est = tdata[ (tdata['date'] >= est_s) & (tdata['date'] <= est_e) ].dropna(subset=['ret', 'mkt_ret']) if len(est) < min_obs: continue # Total volatility total_vol = est['ret'].std() # Market model -> IVOL X = est[['mkt_ret']].values y = est['ret'].values reg = LinearRegression().fit(X, y) resid = y - reg.predict(X) ivol = np.std(resid, ddof=1) results.append({ 'ticker': ticker, 'date': dt, 'total_volatility': total_vol, 'idio_volatility': ivol, 'market_beta': reg.coef_[0], 'market_alpha': reg.intercept_, 'r_squared_mm': reg.score(X, y), 'n_vol': len(est), }) vol_df = pd.DataFrame(results) print(f"Volatility: {len(vol_df):,} stock-date obs") print(f" Total vol (ann. mean): " f"{vol_df['total_volatility'].mean() * np.sqrt(252):.4f}") print(f" IVOL (ann. mean): " f"{vol_df['idio_volatility'].mean() * np.sqrt(252):.4f}") return vol_df ``` ### Vietnam-Specific Considerations for Volatility 1. **Price limits compress measured volatility.** Daily limits of $\pm$ 7% (HOSE) and $\pm$ 10% (HNX) mechanically truncate the return distribution, leading to underestimation of true volatility. On limit-hit days, the true equilibrium return may exceed the observed return. Researchers should be aware that volatility-based DIVOP measures may be downward-biased for stocks that frequently hit limits. 2. **VN-Index concentration.** The VN-Index is highly concentrated, the top 10 stocks often account for 50-60% of index weight. For small- and mid-cap stocks, an equal-weighted market return or a composite HOSE+HNX index may provide a better market factor in @eq-market-model. 3. **Thin trading and non-synchronous returns.** For thinly traded stocks, consecutive zero-return days can depress measured volatility. The @dimson1979risk adjustment (including lagged and lead market returns in the market model) may help correct for non-synchronous trading bias in the beta estimate, though its effect on IVOL is typically small. # Spread-Based and Liquidity DIVOP Proxies {#sec-spread} ## Bid-Ask Spread (BASPREAD) {#sec-baspread} ### Theoretical Motivation The bid-ask spread reflects the adverse selection costs faced by limit order providers. When investors hold heterogeneous beliefs, each trade is more likely to convey private information, raising the adverse selection component of the spread. @handa2003quote show that in order-driven markets the spread widens when divergence of opinion increases because limit order providers face greater risk of being picked off by informed traders. @chung2014simple demonstrate that closing bid-ask spreads from daily data provide a reliable approximation to intraday effective spreads. ### Construction We compute the proportional bid-ask spread using end-of-day quote data: $$ \text{BASPREAD}_{i,t} = \frac{\text{Ask}_{i,t} - \text{Bid}_{i,t}}{\text{Midpoint}_{i,t}} $$ {#eq-baspread} where $\text{Midpoint}_{i,t} = (\text{Ask}_{i,t} + \text{Bid}_{i,t}) / 2$. When end-of-day bid and ask are unavailable, we use the daily high-low range as a fallback. Following @chung2014simple, we delete observations where both Bid and Ask are zero, and where the spread exceeds 50% of the midpoint. ## Amihud Illiquidity (ILLIQ) {#sec-amihud} The @amihud2002illiquidity ratio measures the price impact of order flow: $$ \text{ILLIQ}_{i,t} = \frac{|r_{i,t}|}{\text{DolVol}_{i,t}} $$ {#eq-amihud} where $\text{DolVol}_{i,t} = \text{Volume}_{i,t} \times \text{Price}_{i,t}$ (in billions VND for scaling). Higher ILLIQ reflects greater information asymmetry. We average daily ratios over monthly horizons and use the log transformation due to heavy right skew. ```{python} #| label: spread-illiq #| code-summary: "Construct bid-ask spread and Amihud illiquidity" def compute_spread_and_illiq(df, config): """Compute bid-ask spread (BASPREAD) and Amihud illiquidity.""" df = df.copy() # --- Bid-Ask Spread --- df['midpoint_ba'] = (df['ask'] + df['bid']) / 2 df['baspread_ba'] = np.where( (df['ask'] > 0) & (df['bid'] > 0) & (df['midpoint_ba'] > 0), (df['ask'] - df['bid']) / df['midpoint_ba'], np.nan ) # Fallback: high/low range df['midpoint_hl'] = (df['high'] + df['low']) / 2 df['baspread_hl'] = np.where( (df['high'] > 0) & (df['low'] > 0) & (df['midpoint_hl'] > 0), (df['high'] - df['low']) / df['midpoint_hl'], np.nan ) df['baspread'] = df['baspread_ba'].fillna(df['baspread_hl']) df['midpoint'] = df['midpoint_ba'].fillna(df['midpoint_hl']) # Chung & Zhang (2009) filters bad = (df['baspread'].isna()) | \ (df['baspread'] > config['max_spread_pct']) | \ (df['baspread'] < 0) df.loc[bad, 'baspread'] = np.nan # --- Amihud Illiquidity --- df['dollar_vol'] = df['volume'] * df['close'] / 1e9 df['amihud_daily'] = np.where( df['dollar_vol'] > 0, np.abs(df['ret']) / df['dollar_vol'], np.nan ) print(f"BASPREAD: {df['baspread'].notna().sum():,} valid obs, " f"mean = {df['baspread'].mean():.6f}") print(f"AMIHUD: {df['amihud_daily'].notna().sum():,} valid obs, " f"mean = {df['amihud_daily'].mean():.6f}") return df def compute_amihud_monthly(df): """Monthly Amihud = mean daily |ret|/dollar_vol (min 15 days).""" df = df.copy() df['ym'] = df['date'].dt.to_period('M') agg = df.groupby(['ticker', 'ym']).agg( illiq_mean=('amihud_daily', 'mean'), n_days=('amihud_daily', 'count'), ).reset_index() agg = agg[agg['n_days'] >= 15].copy() agg['log_illiq'] = np.log(agg['illiq_mean'] + 1e-10) return agg ``` ### Vietnam-Specific Considerations for Spread and Liquidity 1. **Tick size schedule.** Vietnam uses variable tick sizes: 10 VND (prices \< 10,000), 50 VND (10,000--49,950), and 100 VND (≥ 50,000) on HOSE. These impose a floor on quoted spreads for low-priced stocks. Researchers should be cautious interpreting cross-price-decile spread variation as reflecting opinion divergence rather than tick-size mechanics. 2. **Order-driven market structure.** Both HOSE and HNX are pure order-driven markets where public limit orders provide liquidity. This makes the @chung2014simple CRSP-based spread approximation appropriate. 3. **Lot size requirements.** HOSE requires 100-share standard lots for continuous trading. For high-priced stocks, the standard lot represents a large capital commitment, potentially inflating quoted spreads relative to effective trading costs. 4. **Call auction effects.** Opening and closing sessions on HOSE use periodic call auctions, which can produce bid-ask quotes that differ substantially from continuous-trading spreads. # Analyst Forecast Dispersion {#sec-analyst} ## Theoretical Motivation Analyst forecast dispersion, the cross-sectional standard deviation of individual analysts' earnings forecasts, is the most direct measure of divergence of opinion. Unlike market-based proxies that capture disagreement indirectly, forecast dispersion directly measures disagreement among informed market participants. @abarbanell1995analysts establish the theoretical basis, and @diether2002differences demonstrate that stocks with higher analyst forecast dispersion earn lower subsequent returns, consistent with the Miller overpricing hypothesis. ## Data Challenges in Vietnam Constructing analyst forecast dispersion in Vietnam presents substantial challenges relative to the U.S.: - **Coverage breadth.** While I/B/E/S covers over 4,000 U.S. companies, only 100--150 Vietnamese firms typically have coverage by at least 3 analysts, concentrated among VN30 constituents. - **Data sources.** Analyst forecasts are available from DataCore.vn, FiinPro, Bloomberg, and Refinitiv. The choice of source affects coverage and timeliness. - **Forecast staleness.** With limited coverage, forecasts may go unrevised for months. Following I/B/E/S methodology, we carry each forecast forward for a maximum of 105 days. ## Construction Methodology The construction proceeds as follows: 1. **Clean individual forecasts.** Remove observations where the announcement date precedes the review date. Keep only annual EPS forecasts. For each analyst-ticker-fiscal period, retain only the latest forecast per calendar month. 2. **Handle stopped and excluded estimates.** Remove forecasts where the analyst has left the brokerage or the estimate has been excluded from consensus. 3. **Carry forward with staleness control.** Each forecast is valid until the earlier of: (a) the next forecast by the same analyst, (b) 105 days after the announcement, or (c) the actual earnings announcement date. 4. **Expand to monthly frequency.** For each ticker-month, identify all valid outstanding forecasts and compute dispersion. 5. **Compute scaled measures:** $$ \text{DISP1}_{i,m} = \frac{\text{Std}(\hat{\text{EPS}}_{i,m}^{(a)})}{|\text{Mean}(\hat{\text{EPS}}_{i,m}^{(a)})|} \qquad \text{DISP2}_{i,m} = \frac{\text{Std}(\hat{\text{EPS}}_{i,m}^{(a)})}{\bar{P}_{i,m}} $$ ```{python} #| label: analyst-dispersion #| code-summary: "Construct analyst forecast dispersion (DISP1, DISP2)" def construct_analyst_dispersion(forecasts_df, price_df, config): """ Construct analyst forecast dispersion measures. Parameters ---------- forecasts_df : pd.DataFrame Individual analyst forecasts with: ticker, analyst_id, broker, fpedats, anndats, revdats, value (EPS), anndats_act. price_df : pd.DataFrame Monthly price: ticker, month, mean_price. config : dict With min_analysts, forecast_carry_days. """ carry_days = config['forecast_carry_days'] min_analysts = config['min_analysts'] df = forecasts_df.copy() df = df[df['anndats'] <= df['revdats']].copy() df = df.dropna(subset=['fpedats', 'anndats', 'value']) # Latest forecast per analyst-month df['ym'] = df['anndats'].dt.to_period('M') df = df.sort_values( ['ticker', 'fpedats', 'analyst_id', 'ym', 'anndats', 'revdats'] ) df = df.groupby(['ticker', 'fpedats', 'analyst_id', 'ym']).tail(1) # Carry-forward end date df = df.sort_values( ['ticker', 'analyst_id', 'fpedats', 'anndats'], ascending=[True, True, True, False] ) df['next_ann'] = df.groupby( ['ticker', 'analyst_id', 'fpedats'] )['anndats'].shift(-1) def _carry_end(row): candidates = [row['anndats'] + pd.Timedelta(days=carry_days)] if pd.notna(row.get('next_ann')): candidates.append(row['next_ann']) if pd.notna(row.get('anndats_act')): candidates.append(row['anndats_act']) return min(candidates) df['carry_end'] = df.apply(_carry_end, axis=1) # Monthly expansion months = pd.period_range(config['beg_date'], config['end_date'], freq='M') records = [] for month in months: me = month.to_timestamp(how='end') valid = df[(df['anndats'] <= me) & (df['carry_end'] > me)].copy() valid = valid[valid['fpedats'] > me] valid = valid.sort_values(['ticker', 'analyst_id', 'anndats']) valid = valid.groupby(['ticker', 'analyst_id']).tail(1) disp = valid.groupby('ticker').agg( n_analysts=('analyst_id', 'nunique'), mean_fcst=('value', 'mean'), std_fcst=('value', 'std'), ).reset_index() disp['month'] = month records.append(disp) if not records: return pd.DataFrame() disp_df = pd.concat(records, ignore_index=True) # Scaled measures disp_df['disp1'] = np.where( disp_df['mean_fcst'].abs() > 0, disp_df['std_fcst'] / disp_df['mean_fcst'].abs(), np.nan ) disp_df = disp_df.merge(price_df, on=['ticker', 'month'], how='left') disp_df['disp2'] = np.where( disp_df['mean_price'] > 0, disp_df['std_fcst'] / disp_df['mean_price'], np.nan ) disp_df['disp_raw'] = disp_df['std_fcst'] out = disp_df[disp_df['n_analysts'] >= min_analysts].copy() print(f"DISP: {len(out):,} ticker-months (>= {min_analysts} analysts)") print(f" Mean analysts: {out['n_analysts'].mean():.1f}") return out ``` ## Scaling Considerations Following @cheong2011eps, we note that each scaling choice has pitfalls. DISP1 (scaled by absolute mean forecast) can produce extreme values when the mean forecast approaches zero---common for Vietnamese firms near breakeven. DISP2 (scaled by price) introduces a mechanical negative correlation between price and scaled dispersion. We recommend reporting all three versions (DISP1, DISP2, and unscaled DISP_RAW with $\ln(\text{Price})$ as an additional control), and winsorizing DISP1 at the 1st and 99th percentiles. ::: callout-warning ## Caution on Analyst Dispersion in Thin-Coverage Markets With typical coverage of 5--10 analysts per firm in Vietnam (versus 15--25 in the U.S.), forecast dispersion is estimated with substantially greater noise. A dispersion measure from 3 analysts has a very different sampling distribution than one from 20. Always include the number of analysts as a control and test robustness with varying minimum-analyst thresholds (3, 5, 7). ::: # Cross-Sectional Correlations Among DIVOP Proxies {#sec-correlation} An important empirical question is the degree to which the various DIVOP proxies capture the same underlying construct. If divergence of opinion is a well-defined latent variable, we expect positive correlations among all proxies, though correlations need not be high since each captures a different facet of disagreement. ```{python} #| label: correlation-analysis #| code-summary: "Spearman rank correlations among DIVOP proxies" def compute_divop_correlations(merged_df, proxies=None): """ Compute and visualize Spearman correlations among DIVOP proxies. We use rank correlations because many proxies are right-skewed. """ if proxies is None: proxies = [ 'dto', 'suv', 'total_volatility', 'idio_volatility', 'baspread', 'amihud_daily', 'disp1', 'disp2' ] available = [p for p in proxies if p in merged_df.columns] data = merged_df[available].dropna() n = len(available) rho_mat = np.eye(n) p_mat = np.zeros((n, n)) for i in range(n): for j in range(i + 1, n): rho, p = scipy_stats.spearmanr( data[available[i]], data[available[j]] ) rho_mat[i, j] = rho_mat[j, i] = rho p_mat[i, j] = p_mat[j, i] = p labels = {'dto': 'DTO', 'suv': 'SUV', 'total_volatility': 'VOL', 'idio_volatility': 'IVOL', 'baspread': 'SPREAD', 'amihud_daily': 'ILLIQ', 'disp1': 'DISP1', 'disp2': 'DISP2'} pretty = [labels.get(c, c) for c in available] corr_df = pd.DataFrame(rho_mat, index=pretty, columns=pretty) # Heatmap fig, ax = plt.subplots(figsize=(9, 7)) mask = np.triu(np.ones_like(corr_df, dtype=bool), k=1) sns.heatmap( corr_df, mask=mask, annot=True, fmt='.3f', cmap='RdBu_r', center=0, vmin=-0.4, vmax=0.7, square=True, linewidths=0.5, cbar_kws={'shrink': 0.8, 'label': 'Spearman ρ'}, ax=ax ) ax.set_title('Spearman Correlations Among DIVOP Proxies\n' 'Vietnamese Equity Market', fontsize=13, fontweight='bold') plt.tight_layout() plt.savefig('divop_correlations.png', dpi=300, bbox_inches='tight') plt.show() return corr_df ``` ### Expected Correlation Patterns Based on U.S. evidence and theory, we expect: | Pair | Expected | Rationale | |:-----------------------|:-----------------------|:-----------------------| | DTO × SUV | High positive | Both capture abnormal volume; SUV refines DTO | | VOL × IVOL | High positive | IVOL is a subset of total volatility | | SPREAD × ILLIQ | Moderate-high positive | Both capture information asymmetry | | Volume × Volatility | Moderate positive | @shalen1993volume links both to belief dispersion | | Analyst × Market-based | Weak-moderate positive | Different investor populations | : Expected correlation structure among DIVOP proxies {#tbl-expected-corr} # Descriptive Statistics and Cross-Sectional Properties {#sec-empirical} ## Summary Statistics ```{python} #| label: descriptive-stats #| code-summary: "Descriptive statistics for all DIVOP proxies" def descriptive_statistics(merged_df): """Comprehensive descriptive statistics for DIVOP proxies.""" proxies = { 'dto': 'Unexplained Volume (DTO)', 'suv': 'Std Unexplained Volume (SUV)', 'total_volatility': 'Total Return Volatility', 'idio_volatility': 'Idiosyncratic Volatility', 'baspread': 'Bid-Ask Spread', 'amihud_daily': 'Amihud Illiquidity', 'disp1': 'Analyst Disp (mean-scaled)', 'disp2': 'Analyst Disp (price-scaled)', } avail = {k: v for k, v in proxies.items() if k in merged_df.columns} rows = [] for col, label in avail.items(): s = merged_df[col].dropna() rows.append({ 'Proxy': label, 'N': f'{len(s):,}', 'Mean': f'{s.mean():.6f}', 'Std': f'{s.std():.6f}', 'P5': f'{s.quantile(.05):.6f}', 'Median': f'{s.median():.6f}', 'P95': f'{s.quantile(.95):.6f}', 'Skew': f'{s.skew():.2f}', 'Kurt': f'{s.kurtosis():.2f}', }) stats = pd.DataFrame(rows).set_index('Proxy') print("\n" + "=" * 90) print("Descriptive Statistics of DIVOP Proxies") print("Vietnamese Equity Market, HOSE and HNX") print("=" * 90) print(stats.to_string()) return stats ``` ## DIVOP by Firm Characteristics ```{python} #| label: by-characteristics #| code-summary: "DIVOP by size, exchange, and foreign ownership" def divop_by_size(merged_df): """Mean DIVOP proxies by market-cap quintile.""" df = merged_df.copy() df['mkt_cap'] = df['close'] * df['shares_outstanding'] df['size_q'] = df.groupby('date')['mkt_cap'].transform( lambda x: pd.qcut(x, 5, labels=['Q1 Small','Q2','Q3','Q4','Q5 Large'], duplicates='drop') ) proxies = ['dto','suv','total_volatility','idio_volatility', 'baspread','amihud_daily'] avail = [p for p in proxies if p in df.columns] tab = df.groupby('size_q')[avail].mean() print("\n=== Mean DIVOP by Size Quintile ===") print(tab.to_string(float_format='{:.6f}'.format)) return tab def divop_by_exchange(merged_df): """Compare mean DIVOP across HOSE and HNX.""" proxies = ['dto','suv','total_volatility','idio_volatility', 'baspread','amihud_daily'] avail = [p for p in proxies if p in merged_df.columns] tab = merged_df.groupby('exchange')[avail].mean() print("\n=== Mean DIVOP by Exchange ===") print(tab.to_string(float_format='{:.6f}'.format)) return tab ``` ## Time-Series Evolution ```{python} #| label: time-series-plot #| code-summary: "Time-series evolution of DIVOP proxies" def plot_divop_timeseries(merged_df): """Plot monthly cross-sectional median DIVOP with crisis shading.""" df = merged_df.copy() df['ym'] = df['date'].dt.to_period('M') proxies = ['dto','suv','total_volatility','baspread'] avail = [p for p in proxies if p in df.columns] monthly = df.groupby('ym')[avail].median() monthly.index = monthly.index.to_timestamp() fig, axes = plt.subplots(len(avail), 1, figsize=(13, 3.5*len(avail)), sharex=True) if len(avail) == 1: axes = [axes] labels = {'dto':'DTO','suv':'SUV', 'total_volatility':'Volatility','baspread':'Spread'} colors = ['#1976D2','#388E3C','#F57C00','#D32F2F'] for i, (proxy, ax) in enumerate(zip(avail, axes)): ax.plot(monthly.index, monthly[proxy], color=colors[i], linewidth=1.3) ax.set_ylabel(labels.get(proxy, proxy), fontsize=10) ax.grid(True, alpha=0.25) for s, e, c in [('2008-01','2009-06','red'), ('2020-01','2020-12','orange'), ('2022-09','2023-06','purple')]: ax.axvspan(pd.Timestamp(s), pd.Timestamp(e), alpha=0.1, color=c) axes[0].set_title( 'Time-Series of DIVOP Proxies\n' 'Monthly Cross-Sectional Median, HOSE & HNX', fontsize=13, fontweight='bold') from matplotlib.patches import Patch axes[-1].legend(handles=[ Patch(facecolor='red', alpha=.2, label='GFC 2008-09'), Patch(facecolor='orange', alpha=.2, label='COVID-19'), Patch(facecolor='purple', alpha=.2, label='Bond Crisis 2022-23'), ], loc='upper right', fontsize=8) plt.tight_layout() plt.savefig('divop_timeseries.png', dpi=300, bbox_inches='tight') plt.show() ``` # Putting It All Together {#sec-pipeline} ```{python} #| label: merge-all #| code-summary: "Master pipeline: build full DIVOP dataset" def build_divop_dataset(config): """ Master pipeline: load data, construct all DIVOP proxies, merge into a single stock-date panel. """ df = load_daily_data(config) df = apply_sample_filters(df, config) df = adjust_for_corporate_actions(df) calendar = build_trading_calendar(df, config) df = compute_dto(df, config) suv_df = compute_suv(df, calendar, config) vol_df = compute_volatility(df, calendar, config) df = compute_spread_and_illiq(df, config) # Merge base = df[['ticker','date','ret','close','volume', 'shares_outstanding','exchange','industry_icb', 'foreign_ownership_pct','turnover', 'mato','dto','baspread','amihud_daily','limit_hit']].copy() if not suv_df.empty: base = base.merge( suv_df[['ticker','date','suv','predicted_turnover']], on=['ticker','date'], how='left') if not vol_df.empty: base = base.merge( vol_df[['ticker','date','total_volatility', 'idio_volatility','market_beta']], on=['ticker','date'], how='left') print(f"\n=== Final DIVOP Dataset ===") print(f"Shape: {base.shape}") print(f"Tickers: {base['ticker'].nunique()}") return base ``` # Empirical Applications {#sec-applications} ## Application 1: DIVOP and the Cross-Section of Returns The fundamental test of the Miller hypothesis is whether stocks with higher divergence of opinion earn lower subsequent returns. We implement Fama-MacBeth cross-sectional regressions: $$ r_{i,t+1:t+h} = \gamma_{0,t} + \gamma_{1,t} \cdot \text{DIVOP}_{i,t} + \gamma_{2,t}' \mathbf{X}_{i,t} + \varepsilon_{i,t} $$ where $\mathbf{X}_{i,t}$ includes controls for market beta, log market capitalization, and log book-to-market ratio. The Miller hypothesis predicts $\bar{\gamma}_1 < 0$. ```{python} #| label: fama-macbeth #| code-summary: "Fama-MacBeth regressions of returns on DIVOP" def fama_macbeth_divop(merged_df, divop_proxy='suv', controls=None, horizon=21): """ Fama-MacBeth cross-sectional regressions. Miller predicts gamma_1 < 0; Varian predicts gamma_1 > 0. """ if controls is None: controls = ['market_beta', 'log_mktcap'] df = merged_df.copy() df = df.sort_values(['ticker', 'date']) df['fwd_ret'] = df.groupby('ticker')['ret'].transform( lambda x: x.shift(-1).rolling(horizon).sum().shift(-(horizon-1)) ) df['log_mktcap'] = np.log( df['close'] * df['shares_outstanding'] + 1 ) reg_vars = ['fwd_ret', divop_proxy] + \ [c for c in controls if c in df.columns] df_reg = df[['ticker','date'] + reg_vars].dropna() from numpy.linalg import lstsq results = [] for date, cross in df_reg.groupby('date'): if len(cross) < 30: continue y = cross['fwd_ret'].values X_cols = [divop_proxy] + [c for c in controls if c in cross.columns] X = np.column_stack([np.ones(len(cross)), cross[X_cols].values]) try: coefs, _, _, _ = lstsq(X, y, rcond=None) results.append({ 'date': date, 'intercept': coefs[0], f'gamma_{divop_proxy}': coefs[1], 'n': len(cross), }) except Exception: continue fm = pd.DataFrame(results) gc = f'gamma_{divop_proxy}' mu = fm[gc].mean() se = fm[gc].std() / np.sqrt(len(fm)) t = mu / se print(f"\n=== Fama-MacBeth: {divop_proxy} -> " f"{horizon}-day fwd returns ===") print(f" Mean gamma: {mu:.6f}, t-stat: {t:.3f}") if t < -1.96: print(" -> Supports Miller (1977)") elif t > 1.96: print(" -> Supports Varian (1985)") else: print(" -> Inconclusive at 5%") return fm ``` ## Application 2: DIVOP and Earnings Announcements Following @berkman2009sell, we test whether high-DIVOP stocks experience negative abnormal returns around earnings announcements, as uncertainty resolution reduces the optimism premium. ```{python} #| label: ea-event #| code-summary: "Event study: DIVOP and earnings announcement returns" def divop_earnings_event(merged_df, ea_dates_df, divop_proxy='suv', window=(-1, 3)): """ Sort stocks into DIVOP quintiles pre-EA, compute CAR in window. Miller predicts: Q5 (high DIVOP) has lower CAR than Q1 (low DIVOP). """ df = merged_df.copy() ea = ea_dates_df.copy() # Pre-EA DIVOP value (5 days before) ea['pre_date'] = ea['ea_date'] - pd.Timedelta(days=5) ea = ea.merge( df[['ticker','date',divop_proxy]].rename( columns={'date':'pre_date'}), on=['ticker','pre_date'], how='inner' ) ea['divop_q'] = pd.qcut( ea[divop_proxy], 5, labels=['Q1 Low','Q2','Q3','Q4','Q5 High'], duplicates='drop' ) print(f"\n=== EA Event Study by {divop_proxy} quintile ===") print(f" Window: ({window[0]}, {window[1]}) days") print(f" Miller predicts: Q5 has lower CAR than Q1") return ea ``` ## Application 3: Composite DIVOP Index via PCA When a single summary measure of disagreement is needed, PCA on the battery of standardized proxies extracts the common "disagreement factor." ```{python} #| label: pca-composite #| code-summary: "Composite DIVOP index via PCA" from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA def composite_divop_pca(merged_df, proxies=None): """Extract first principal component from standardized DIVOP proxies.""" if proxies is None: proxies = ['dto','suv','total_volatility','idio_volatility', 'baspread','amihud_daily'] avail = [p for p in proxies if p in merged_df.columns] data = merged_df[['ticker','date'] + avail].dropna() scaler = StandardScaler() X = scaler.fit_transform(data[avail]) pca = PCA(n_components=3) factors = pca.fit_transform(X) data['divop_composite'] = factors[:, 0] # Ensure positive correlation with inputs for col in avail: if data['divop_composite'].corr(data[col]) < 0: data['divop_composite'] *= -1 break loadings = pd.DataFrame( pca.components_.T, index=avail, columns=['PC1','PC2','PC3'] ) print(f"\n=== PCA Composite DIVOP ===") print(f"Variance explained: " f"{pca.explained_variance_ratio_[:3].round(3)}") print(f"\nLoadings:\n{loadings.to_string(float_format='{:.4f}'.format)}") return data[['ticker','date','divop_composite']], loadings ``` # Conclusion and Practical Recommendations This chapter has provided a comprehensive methodology for constructing seven distinct proxies for divergence of investor opinion adapted to the Vietnamese equity market. We conclude with practical recommendations: **1. Prefer multiple proxies.** No single DIVOP measure is without limitations. We recommend constructing and reporting results for at least three proxies spanning different economic channels (volume, volatility, spreads or analyst-based). **2. Account for Vietnam-specific microstructure.** Daily price limits, T+2 settlement, foreign ownership constraints, and the order-driven market structure all affect DIVOP properties. Flag limit-hit days, include exchange fixed effects, and control for foreign ownership. **3. Vietnam as a natural laboratory for Miller (1977).** The absence of short selling through 2024 and the dominance of retail investors create conditions that closely match Miller's theoretical setting. The introduction of short selling in 2025 creates a natural experiment for examining how relaxation of short-sale constraints affects the DIVOP-return relation. **4. Control for analyst coverage when using DISP measures.** With typical coverage of 5--10 analysts per firm, forecast dispersion is estimated with greater noise than in developed markets. Always include the number of analysts as a control variable and conduct robustness checks with varying minimum-analyst thresholds. **5. Consider constructing a composite index.** When researchers need a single summary measure of disagreement, the PCA-based composite index described in @sec-applications provides a principled approach to aggregating information across the individual proxies. The first principal component typically explains 30-50% of the common variation in the battery of DIVOP measures. **6. Winsorize aggressively.** Several DIVOP proxies (particularly DISP1, Amihud ILLIQ, and SUV) exhibit extreme outliers in the Vietnamese data. Winsorization at the 1st and 99th percentiles (or even 2nd and 98th for DISP1) is essential for obtaining reliable regression results. **7. Be cautious about causal inference.** DIVOP proxies are endogenous, they respond to the same firm characteristics (size, leverage, growth) that also affect returns. Researchers should use appropriate controls, consider instrumental variables where feasible, and be explicit about the limitations of their identification strategy. The DIVOP framework is particularly relevant for the Vietnamese market at this point in its development. As the market matures toward potential FTSE Emerging Market reclassification, as short selling becomes more widely available, and as institutional investor participation grows, the dynamics of opinion divergence and its pricing implications are likely to evolve significantly. The methodology presented in this chapter provides researchers with the tools to document and analyze these changes as they unfold.