34  Institutional Trades, Flows, and Turnover Ratios

Institutional investors play a pivotal role in price discovery, corporate governance, and market liquidity. Understanding how institutions trade and how much they trade provides insights into both asset pricing dynamics and the real effects of institutional monitoring. The seminal work of Grinblatt, Titman, and Wermers (1995) on mutual fund momentum trading, Wermers (2000) on fund performance decomposition, and Yan (2008) on the relationship between turnover and future returns all rely on accurately measured institutional trades, flows, and turnover.

In the United States, this research is enabled by the mandatory quarterly 13F filing system administered by the Securities and Exchange Commission (SEC). Every institutional investment manager with at least $100 million in qualifying assets must disclose their equity holdings within 45 days of each calendar quarter end. The Thomson-Reuters (now Refinitiv) 13F database, accessible through WRDS, provides the canonical data infrastructure for this literature.

Vietnam’s equity market presents a fundamentally different institutional landscape. This chapter adapts the core methodology for the Vietnamese context, addressing five critical differences:

  1. Disclosure regime. Vietnam has no 13F-equivalent mandatory quarterly filing. Ownership disclosure is a patchwork of event-driven reports (threshold crossings at 5%, 10%, etc.), annual/semi-annual reports with shareholder registers, and daily foreign ownership tracking by exchanges.

  2. Corporate actions. Vietnamese firms issue stock dividends and bonus shares at extremely high rates compared to US firms. A firm might issue 20-30% bonus shares in a single year, fundamentally altering the share count. Share adjustment is therefore critical and nontrivial.

  3. Foreign ownership limits (FOLs). Binding foreign ownership ceiling, typically 49% for most sectors, 30% for banking, and 0% for certain restricted sectors, create a unique institutional constraint. When a stock approaches its FOL, foreign buying becomes mechanically restricted, distorting standard trade inference.

  4. State ownership. The Vietnamese government retains significant ownership in many listed firms through the State Capital Investment Corporation (SCIC) and other state entities. This creates a distinct ownership category not present in the US 13F data.

  5. Market microstructure. Daily price limits (\(\pm 7\%\) on HOSE, \(\pm 10\%\) on HNX, \(\pm 15\%\) on UPCOM), T+2 settlement, and the absence of short-selling all affect how institutional trades translate into market outcomes.

34.1 Measuring Institutional Ownership and Trading

The measurement of institutional ownership and trading activity has been a central concern in empirical finance since Gompers, Ishii, and Metrick (2003) documented the rise of institutional investors. The approach relies on comparing holdings snapshots across consecutive reporting periods to infer trades. If manager \(j\) holds \(h_{j,i,t}\) shares of stock \(i\) at time \(t\), then the inferred trade is:

\[ \Delta h_{j,i,t} = h_{j,i,t} - h_{j,i,t-1} \tag{34.1}\]

where \(\Delta h_{j,i,t} > 0\) indicates a buy and \(\Delta h_{j,i,t} < 0\) indicates a sale. This simple differencing approach requires that holdings are observed at regular intervals (e.g., quarterly), share counts are adjusted for corporate actions between reporting dates, and entry and exit from the dataset are handled appropriately.

Chen, Jegadeesh, and Wermers (2000) introduced the concept of ownership breadth (i.e., the number of institutions holding a stock) and showed that changes in breadth predict future returns. Sias (2004) decomposed institutional demand into a herding component and an information component. Yan (2008) linked fund turnover to information-based trading and documented that high-turnover funds outperform, challenging the view that turnover reflects noise trading.

34.2 Trade Classification

Table 34.1 shows four categories of trades:

Table 34.1: Trade Classification Taxonomy
Code Type Description
\(+1\) Initiating Buy Manager enters a new position
\(+2\) Incremental Buy Manager increases an existing position
\(-1\) Terminating Sale Manager completely exits a position
\(-2\) Regular Sale Manager reduces an existing position

This classification is informative because initiating buys and terminating sales represent discrete portfolio decisions with different information content from marginal position adjustments (Alexander, Cici, and Gibson 2007).

34.3 Turnover Measures

Three standard turnover definitions have been used in the literature:

Carhart (1997) Turnover. The minimum of aggregate buys and sales, normalized by average assets:

\[ \text{Turnover}^{C}_{j,t} = \frac{\min\left(\sum_i B_{j,i,t},\, \sum_i S_{j,i,t}\right)} {\frac{1}{2}\left(A_{j,t} + A_{j,t-1}\right)} \tag{34.2}\]

where \(B_{j,i,t}\) and \(S_{j,i,t}\) are the dollar values of buys and sales of stock \(i\) by manager \(j\) in quarter \(t\), and \(A_{j,t}\) is total portfolio assets (Carhart 1997).

Flow-Adjusted Turnover. Adds back the absolute value of net flows to account for flow-driven trading:

\[ \text{Turnover}^{F}_{j,t} = \frac{\min\left(\sum_i B_{j,i,t},\, \sum_i S_{j,i,t}\right) + |\text{NetFlow}_{j,t}|} {A_{j,t-1}} \tag{34.3}\]

Symmetric Turnover. Uses the sum of buys and sales minus the absolute net flow:

\[ \text{Turnover}^{S}_{j,t} = \frac{\sum_i B_{j,i,t} + \sum_i S_{j,i,t} - |\text{NetFlow}_{j,t}|} {A_{j,t-1}} \tag{34.4}\]

The relationship between these measures depends on the correlation between discretionary trading and flow-induced trading (Pástor and Stambaugh 2003).

34.4 Institutional Ownership in Emerging Markets

The emerging markets literature has documented several stylized facts about institutional ownership that differ from developed market findings. Aggarwal et al. (2011) documented that foreign institutional ownership improves corporate governance in emerging markets. For Vietnam specifically, Phung and Mishra (2016) examined the relationship between ownership structure and firm performance, while Vo (2015) studied the impact of foreign ownership on stock market liquidity.

34.5 Net Flows and Performance Attribution

Net flows measure the dollar amount of new money entering or leaving a fund:

\[ \text{NetFlow}_{j,t} = A_{j,t} - A_{j,t-1}(1 + R_{j,t}^p) \tag{34.5}\]

where \(R_{j,t}^p\) is the portfolio return. This decomposition, due to Sirri and Tufano (1998), separates changes in fund assets into investment returns and investor capital allocation decisions. Coval and Stafford (2007) showed that flow-driven trades create price pressure, with fire sales by funds experiencing redemptions generating significant negative abnormal returns.

35 Data Infrastructure

Table 35.1 summarizes the datasets used in this chapter.

Table 35.1: DataCore.vn Datasets Used in This Chapter
Dataset Content Frequency Key Variables
Stock Prices Daily/monthly OHLCV Daily ticker, date, close, adjusted_close, volume, shares_outstanding
Ownership Structure Shareholder composition Quarterly/Annual ticker, date, shareholder_name, shares_held, pct, type
Major Shareholders Holders \(\geq\) 5% Event-driven ticker, date, shareholder_name, shares, is_foreign, is_state
Corporate Actions Splits, dividends, bonus Event ticker, ex_date, action_type, ratio
Company Profile Sector, exchange, FOL Static/Annual ticker, exchange, industry, listing_date, fol_limit
Foreign Ownership Daily foreign tracking Daily ticker, date, foreign_shares, foreign_pct, fol_limit
Fund Holdings Fund portfolio snapshots Semi-annual fund_name, report_date, ticker, shares_held, market_value

35.1 Data Reader Class

We begin by defining a unified data reader that handles file loading, date parsing, and basic validation:

@dataclass
class DataCoreReader:
    """
    Unified reader for DataCore.vn datasets stored locally.
    
    Supports Parquet (recommended) and CSV formats. Implements
    lazy loading with caching to minimize memory footprint.
    
    Parameters
    ----------
    data_dir : str or Path
        Directory containing DataCore.vn data files.
    file_format : str
        File format: 'parquet' or 'csv'.
    
    Examples
    --------
    >>> dc = DataCoreReader('/data/datacore', file_format='parquet')
    >>> prices = dc.prices
    >>> ownership = dc.ownership
    """
    data_dir: Path
    file_format: str = 'parquet'
    _cache: Dict[str, pd.DataFrame] = field(
        default_factory=dict, repr=False
    )
    
    FILE_MAP: Dict[str, str] = field(default_factory=lambda: {
        'prices': 'stock_prices',
        'ownership': 'ownership_structure',
        'major_shareholders': 'major_shareholders',
        'corporate_actions': 'corporate_actions',
        'company_profile': 'company_profile',
        'financials': 'financial_statements',
        'foreign_ownership': 'foreign_ownership',
        'fund_holdings': 'fund_holdings',
    }, repr=False)
    
    def __post_init__(self):
        self.data_dir = Path(self.data_dir)
        if not self.data_dir.exists():
            raise FileNotFoundError(
                f"Data directory not found: {self.data_dir}"
            )
    
    def _read(self, key: str) -> pd.DataFrame:
        """Read and cache a dataset with automatic date parsing."""
        if key in self._cache:
            return self._cache[key]
        
        fname = self.FILE_MAP.get(key, key)
        filepath = self.data_dir / f"{fname}.{self.file_format}"
        
        if not filepath.exists():
            raise FileNotFoundError(
                f"Dataset not found: {filepath}\n"
                f"Available: "
                f"{list(self.data_dir.glob(f'*.{self.file_format}'))}"
            )
        
        if self.file_format == 'parquet':
            df = pd.read_parquet(filepath)
        else:
            df = pd.read_csv(filepath, parse_dates=True)
        
        # Auto-detect and parse date columns
        date_cols = [
            'date', 'ex_date', 'record_date', 'period',
            'report_date', 'listing_date'
        ]
        for col in df.columns:
            if col.lower() in date_cols or 'date' in col.lower():
                try:
                    df[col] = pd.to_datetime(df[col])
                except (ValueError, TypeError):
                    pass
        
        self._cache[key] = df
        print(f"  Loaded {key}: {len(df):,} rows x {len(df.columns)} cols")
        return df
    
    @property
    def prices(self) -> pd.DataFrame:
        return self._read('prices')
    
    @property
    def ownership(self) -> pd.DataFrame:
        return self._read('ownership')
    
    @property
    def major_shareholders(self) -> pd.DataFrame:
        return self._read('major_shareholders')
    
    @property
    def corporate_actions(self) -> pd.DataFrame:
        return self._read('corporate_actions')
    
    @property
    def company_profile(self) -> pd.DataFrame:
        return self._read('company_profile')
    
    @property
    def foreign_ownership(self) -> pd.DataFrame:
        return self._read('foreign_ownership')
    
    @property
    def fund_holdings(self) -> pd.DataFrame:
        return self._read('fund_holdings')
    
    def clear_cache(self):
        n = len(self._cache)
        self._cache.clear()
        print(f"  Cleared {n} cached datasets")

# Initialize:
# dc = DataCoreReader('/path/to/datacore_data', file_format='parquet')

36 Stock Price and Return Processing

The first step processes stock data to obtain adjusted prices, shares outstanding, and quarterly returns.

36.1 Price Data Extraction and Adjustment

Vietnamese stock data requires careful adjustment for frequent corporate actions. Unlike the US where CRSP provides a cumulative adjustment factor (cfacpr, cfacshr), in Vietnam we must construct adjustment factors from the corporate actions history.

NoteVietnamese Corporate Actions

Vietnamese firms commonly execute the following corporate actions, each requiring share count and/or price adjustment:

  • Stock dividend (co tuc bang co phieu): e.g., 20% stock dividend means 100 shares become 120 shares
  • Bonus shares (co phieu thuong): free shares distributed from retained earnings
  • Rights issue (phat hanh quyen mua): right to buy new shares at a discount
  • Stock split/reverse split (chia/gop co phieu): rare but occasionally used
def build_adjustment_factors(
    corporate_actions: pd.DataFrame,
) -> pd.DataFrame:
    """
    Construct cumulative share adjustment factors from corporate actions.
    
    This is the Vietnamese equivalent of CRSP's cfacshr factor. For each
    ticker, we compute a cumulative product of adjustment ratios from
    corporate actions, working forward in time.
    
    The adjustment factor at date t converts historical share counts to
    be comparable with current (post-action) share counts:
    
        shares_adjusted_t = shares_raw_t * cfacshr_t
    
    Parameters
    ----------
    corporate_actions : pd.DataFrame
        Corporate actions with columns: ticker, ex_date, action_type,
        ratio. The ratio field represents:
        - Stock dividend 20%: ratio = 1.20
        - 2:1 stock split: ratio = 2.00
        - Bonus shares 10%: ratio = 1.10
    
    Returns
    -------
    pd.DataFrame
        Adjustment factors: ticker, ex_date, cfacshr (cumulative).
    """
    share_actions = corporate_actions[
        corporate_actions['action_type'].isin([
            'stock_dividend', 'bonus_shares', 'stock_split',
            'reverse_split', 'rights_issue'
        ])
    ].copy()
    
    if share_actions.empty:
        return pd.DataFrame(columns=['ticker', 'ex_date', 'cfacshr'])
    
    share_actions = share_actions.sort_values(['ticker', 'ex_date'])
    
    share_actions['cfacshr'] = (
        share_actions
        .groupby('ticker')['ratio']
        .cumprod()
    )
    
    return share_actions[['ticker', 'ex_date', 'cfacshr']].reset_index(
        drop=True
    )


def get_cfacshr_at_date(
    ticker: str,
    date: pd.Timestamp,
    adj_factors: pd.DataFrame,
) -> float:
    """
    Look up the cumulative share adjustment factor for a given
    ticker and date. Returns 1.0 if no corporate actions occurred.
    """
    mask = (
        (adj_factors['ticker'] == ticker) &
        (adj_factors['ex_date'] <= date)
    )
    subset = adj_factors.loc[mask]
    
    if subset.empty:
        return 1.0
    return subset.iloc[-1]['cfacshr']


def adjust_shares_between_dates(
    shares: float,
    ticker: str,
    date_from: pd.Timestamp,
    date_to: pd.Timestamp,
    adj_factors: pd.DataFrame,
) -> float:
    """
    Adjust a share count observed at date_from to be comparable
    with shares observed at date_to, accounting for all intervening
    corporate actions.
    
    Example
    -------
    >>> # Investor held 1000 shares on 2023-01-01
    >>> # A 20% stock dividend occurred on 2023-03-15
    >>> adjust_shares_between_dates(
    ...     1000, 'VNM',
    ...     pd.Timestamp('2023-01-01'),
    ...     pd.Timestamp('2023-06-30'), adj_factors
    ... )
    1200.0
    """
    factor_from = get_cfacshr_at_date(ticker, date_from, adj_factors)
    factor_to = get_cfacshr_at_date(ticker, date_to, adj_factors)
    relative_factor = factor_to / factor_from
    return shares * relative_factor

36.2 Monthly and Quarterly Price Processing

def process_prices(
    prices: pd.DataFrame,
    adj_factors: pd.DataFrame,
    begdate: str = '2010-01-01',
    enddate: str = '2024-12-31',
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """
    Process raw DataCore.vn price data into analysis-ready format.
    
    Block logic:
    1. Filter to date range
    2. Compute adjusted prices and shares outstanding
    3. Compute quarterly compounded returns
    4. Create forward quarterly returns (shifted one quarter)
    
    Parameters
    ----------
    prices : pd.DataFrame
        Raw price data with: ticker, date, close, adjusted_close,
        volume, shares_outstanding.
    adj_factors : pd.DataFrame
        Corporate action adjustment factors.
    begdate, enddate : str
        Sample period boundaries.
    
    Returns
    -------
    Tuple[pd.DataFrame, pd.DataFrame]
        (price_quarterly, qret): quarter-end observations with
        adjusted price, total shares, and forward quarterly return.
    """
    price = prices[
        (prices['date'] >= begdate) & (prices['date'] <= enddate)
    ].copy()
    
    # Month-end and quarter-end dates
    price['mdate'] = price['date'] + pd.offsets.MonthEnd(0)
    price['qdate'] = price['date'] + pd.offsets.QuarterEnd(0)
    
    # Adjusted price
    if 'adjusted_close' in price.columns:
        price['p'] = price['adjusted_close']
    else:
        price['p'] = price['close']
    
    # Total shares outstanding
    price['tso'] = price['shares_outstanding']
    
    # Market capitalization (millions VND)
    price['mcap'] = price['p'] * price['tso'] / 1e6
    
    # Filter out zero shares
    price = price[price['tso'] > 0].copy()
    
    # Compute daily returns if not present
    if 'ret' not in price.columns:
        price = price.sort_values(['ticker', 'date'])
        price['ret'] = price.groupby('ticker')['p'].pct_change()
    
    price['ret'] = price['ret'].fillna(0)
    price['logret'] = np.log(1 + price['ret'])
    
    # ---- Quarterly compounded returns ----
    qret = (
        price
        .groupby(['ticker', 'qdate'])['logret']
        .sum()
        .reset_index()
    )
    qret['qret'] = np.exp(qret['logret']) - 1
    
    # Shift qdate back one quarter: make qret a *forward* return
    qret['qdate'] = qret['qdate'] + pd.offsets.QuarterEnd(-1)
    qret = qret.drop(columns=['logret'])
    
    # ---- Quarter-end observations ----
    price_q = price[price['qdate'] == price['mdate']].copy()
    price_q = price_q[['qdate', 'ticker', 'p', 'tso', 'mcap']].copy()
    
    # Merge forward quarterly return
    price_q = price_q.merge(qret, on=['ticker', 'qdate'], how='left')
    
    # Build cfacshr lookup at each quarter-end
    price_q['cfacshr'] = price_q.apply(
        lambda row: get_cfacshr_at_date(
            row['ticker'], row['qdate'], adj_factors
        ),
        axis=1
    )
    
    return price_q, qret
TipPerformance Optimization

The get_cfacshr_at_date function uses a row-wise lookup which can be slow for large datasets. For production use with millions of rows, vectorize using pd.merge_asof():

price_q = pd.merge_asof(
    price_q.sort_values('qdate'),
    adj_factors.sort_values('ex_date'),
    by='ticker',
    left_on='qdate',
    right_on='ex_date',
    direction='backward'
).fillna({'cfacshr': 1.0})

The output is a quarterly panel of stock-level observations (@tbl-institutional-price-vars)

Table 36.1: Quarter-End Price Panel Variables
Variable Description
ticker Stock ticker (e.g., VNM, VCB, FPT)
qdate Quarter-end date
p Adjusted closing price (VND)
tso Total shares outstanding
mcap Market capitalization (millions VND)
qret Forward quarterly compounded return
cfacshr Cumulative share adjustment factor

37 Ownership Data Processing

37.1 Ownership Taxonomy

We define a classification system for Vietnamese shareholders that maps to the categories available in DataCore.vn:

class OwnershipType:
    """
    Vietnamese ownership type classification.
    
    Vietnam's ownership structure is fundamentally different from the US:
    
    - **State** (Nha nuoc): SCIC, ministries, state-owned parents
    - **Foreign Institutional** (To chuc nuoc ngoai): foreign funds,
      ETFs, pension funds, insurance, sovereign wealth funds
    - **Domestic Institutional** (To chuc trong nuoc): Vietnamese
      securities companies, fund managers, banks, insurance
    - **Individual** (Ca nhan): retail investors (domestic + foreign)
    - **Treasury** (Co phieu quy): company repurchases
    """
    
    STATE = 'State'
    FOREIGN_INST = 'Foreign Institutional'
    DOMESTIC_INST = 'Domestic Institutional'
    INDIVIDUAL = 'Individual'
    TREASURY = 'Treasury'
    
    INSTITUTIONAL = [FOREIGN_INST, DOMESTIC_INST]
    ALL_INSTITUTIONAL = [STATE, FOREIGN_INST, DOMESTIC_INST]
    ALL_TYPES = [STATE, FOREIGN_INST, DOMESTIC_INST, INDIVIDUAL, TREASURY]
    
    STATE_KEYWORDS = [
        'scic', 'state capital', 'bo', 'ubnd', 'tong cong ty',
        'nha nuoc', 'state', 'government', "people's committee",
        'ministry', 'vietnam national', 'vnpt', 'evn', 'pvn',
    ]
    
    FOREIGN_KEYWORDS = [
        'fund', 'investment', 'capital', 'asset management',
        'securities', 'gic', 'templeton', 'dragon capital',
        'vinacapital', 'mekong capital', 'kb securities',
        'mirae asset', 'samsung', 'jp morgan', 'goldman',
        'blackrock', 'vanguard', 'aberdeen', 'hsbc',
    ]
    
    @classmethod
    def classify(cls, row: pd.Series) -> str:
        """Classify based on explicit flags, then keyword fallback."""
        if pd.notna(row.get('is_state')) and row['is_state']:
            return cls.STATE
        if pd.notna(row.get('is_foreign')) and row['is_foreign']:
            if pd.notna(row.get('is_institution')) and row['is_institution']:
                return cls.FOREIGN_INST
            return cls.INDIVIDUAL
        if pd.notna(row.get('is_institution')) and row['is_institution']:
            return cls.DOMESTIC_INST
        
        name = str(row.get('shareholder_name', '')).lower()
        if any(kw in name for kw in cls.STATE_KEYWORDS):
            return cls.STATE
        if any(kw in name for kw in cls.FOREIGN_KEYWORDS):
            return cls.FOREIGN_INST
        
        return cls.INDIVIDUAL

37.2 Building the Holdings Panel

We construct the holdings panel (i.e., the Vietnamese equivalent of merging the 13F Type 1 and Type 3 datasets). The key steps are:

  1. Identify the first available vintage for each shareholder-stock-report date combination.
  2. Compute reporting gaps to flag first and last reports.
  3. Classify shareholders.
  4. Adjust shares for corporate actions.
def build_holdings_panel(
    ownership: pd.DataFrame,
    adj_factors: pd.DataFrame,
    price_q: pd.DataFrame,
    company_profile: pd.DataFrame,
    begdate: str = '2010-01-01',
    enddate: str = '2024-12-31',
) -> pd.DataFrame:
    """
    Construct the institutional holdings panel from DataCore.vn
    ownership data.
    """
    own = ownership.copy()
    
    # Align to quarter-end
    own['rdate'] = own['date'] + pd.offsets.QuarterEnd(0)
    own['fdate'] = own['date']
    
    own = own[
        (own['rdate'] >= begdate) & (own['rdate'] <= enddate)
    ].copy()
    
    # Keep earliest vintage per shareholder-ticker-rdate
    own = own.sort_values(
        ['shareholder_name', 'ticker', 'rdate', 'fdate']
    )
    fst_vint = (
        own
        .groupby(['shareholder_name', 'ticker', 'rdate'])
        .first()
        .reset_index()
    )
    
    # ---- Reporting gaps for first/last flags ----
    fst_vint = fst_vint.sort_values(
        ['shareholder_name', 'ticker', 'rdate']
    )
    
    grp = fst_vint.groupby(['shareholder_name', 'ticker'])
    fst_vint['lag_rdate'] = grp['rdate'].shift(1)
    
    fst_vint['qtr_gap'] = fst_vint.apply(
        lambda r: (
            (r['rdate'].to_period('Q')
             - r['lag_rdate'].to_period('Q')).n
            if pd.notna(r['lag_rdate']) else np.nan
        ),
        axis=1
    )
    
    fst_vint['first_report'] = (
        fst_vint['qtr_gap'].isna() | (fst_vint['qtr_gap'] >= 2)
    )
    
    # Last report flag (forward gap)
    fst_vint = fst_vint.sort_values(
        ['shareholder_name', 'ticker', 'rdate'],
        ascending=[True, True, False]
    )
    fst_vint['lead_rdate'] = grp['rdate'].shift(1)
    
    fst_vint['lead_gap'] = fst_vint.apply(
        lambda r: (
            (r['lead_rdate'].to_period('Q')
             - r['rdate'].to_period('Q')).n
            if pd.notna(r['lead_rdate']) else np.nan
        ),
        axis=1
    )
    
    fst_vint['last_report'] = (
        fst_vint['lead_gap'].isna() | (fst_vint['lead_gap'] >= 2)
    )
    
    fst_vint = fst_vint.drop(
        columns=['lag_rdate', 'qtr_gap', 'lead_rdate', 'lead_gap'],
        errors='ignore'
    )
    
    # ---- Classify shareholders ----
    fst_vint['owner_type'] = fst_vint.apply(
        OwnershipType.classify, axis=1
    )
    
    # ---- Adjust shares for corporate actions ----
    fst_vint = fst_vint.merge(
        price_q[['ticker', 'qdate', 'cfacshr']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    
    fst_vint['shares_adj'] = (
        fst_vint['shares_held'] * fst_vint['cfacshr']
    )
    fst_vint = fst_vint[fst_vint['shares_adj'] > 0].copy()
    
    fst_vint = fst_vint.drop_duplicates(
        subset=['shareholder_name', 'ticker', 'rdate']
    )
    
    # Merge company profile
    if company_profile is not None:
        fst_vint = fst_vint.merge(
            company_profile[['ticker', 'exchange', 'fol_limit']]
            .drop_duplicates(),
            on='ticker',
            how='left'
        )
    
    cols = [
        'shareholder_name', 'ticker', 'rdate', 'fdate',
        'shares_held', 'shares_adj', 'owner_type',
        'first_report', 'last_report'
    ]
    if 'exchange' in fst_vint.columns:
        cols.extend(['exchange', 'fol_limit'])
    
    holdings = fst_vint[cols].copy()
    
    print(f"Holdings panel: {len(holdings):,} observations")
    print(f"  Shareholders: {holdings['shareholder_name'].nunique():,}")
    print(f"  Stocks: {holdings['ticker'].nunique():,}")
    print(f"  Quarters: {holdings['rdate'].nunique()}")
    
    return holdings

38 Institutional Ownership Metrics

Before computing trades, we establish the standard institutional ownership metrics that serve as both outputs and inputs to the trading analysis.

38.1 Institutional Ownership Ratio

The institutional ownership ratio (IO) for stock \(i\) at time \(t\) is:

\[ IO_{i,t} = \frac{\sum_{j \in \mathcal{J}} h_{j,i,t}}{TSO_{i,t}} \tag{38.1}\]

where \(\mathcal{J}\) is the set of institutional investors and \(TSO_{i,t}\) is total shares outstanding. In Vietnam, we compute separate ratios for each ownership type:

\[ IO_{i,t}^{\text{type}} = \frac{\sum_{j \in \mathcal{J}^{\text{type}}} h_{j,i,t}}{TSO_{i,t}}, \quad \text{type} \in \{\text{State}, \text{Foreign}, \text{Domestic}, \text{Individual}\} \tag{38.2}\]

def compute_io_ratios(
    holdings: pd.DataFrame,
    price_q: pd.DataFrame,
) -> pd.DataFrame:
    """Compute IO ratios by type for each stock-quarter."""
    agg = (
        holdings
        .groupby(['ticker', 'rdate', 'owner_type'])['shares_adj']
        .sum()
        .reset_index()
    )
    
    io_wide = agg.pivot_table(
        index=['ticker', 'rdate'],
        columns='owner_type',
        values='shares_adj',
        fill_value=0
    ).reset_index()
    
    io_wide.columns = [
        c if c in ['ticker', 'rdate']
        else f'shares_{c.lower().replace(" ", "_")}'
        for c in io_wide.columns
    ]
    
    io_wide = io_wide.merge(
        price_q[['ticker', 'qdate', 'tso']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    
    share_cols = [c for c in io_wide.columns if c.startswith('shares_')]
    for col in share_cols:
        ratio_name = col.replace('shares_', 'io_')
        io_wide[ratio_name] = io_wide[col] / io_wide['tso']
    
    inst_cols = [
        c for c in io_wide.columns
        if c.startswith('shares_')
        and 'individual' not in c
        and 'treasury' not in c
    ]
    io_wide['io_total_inst'] = (
        io_wide[inst_cols].sum(axis=1) / io_wide['tso']
    )
    
    return io_wide

38.2 Ownership Concentration: Herfindahl-Hirschman Index

The HHI measures ownership concentration:

\[ HHI_{i,t} = \sum_{j=1}^{N_{i,t}} \left(\frac{h_{j,i,t}}{\sum_{k=1}^{N_{i,t}} h_{k,i,t}}\right)^2 \tag{38.3}\]

where \(N_{i,t}\) is the number of shareholders. HHI ranges from \(1/N_{i,t}\) (equal) to 1 (single shareholder). In Vietnam, ownership tends to be highly concentrated due to large state and founding-family blocks.

def compute_hhi(holdings: pd.DataFrame) -> pd.DataFrame:
    """Compute HHI for each stock-quarter, overall and institutional."""
    def _hhi(shares: pd.Series) -> float:
        total = shares.sum()
        if total <= 0:
            return np.nan
        weights = shares / total
        return (weights ** 2).sum()
    
    hhi_overall = (
        holdings.groupby(['ticker', 'rdate'])['shares_adj']
        .apply(_hhi).reset_index()
        .rename(columns={'shares_adj': 'hhi_overall'})
    )
    
    inst = holdings[
        holdings['owner_type'].isin(OwnershipType.ALL_INSTITUTIONAL)
    ]
    hhi_inst = (
        inst.groupby(['ticker', 'rdate'])['shares_adj']
        .apply(_hhi).reset_index()
        .rename(columns={'shares_adj': 'hhi_institutional'})
    )
    
    return hhi_overall.merge(hhi_inst, on=['ticker', 'rdate'], how='left')

38.3 Ownership Breadth

Following Chen, Jegadeesh, and Wermers (2000), ownership breadth is the number of institutional holders:

\[ \text{Breadth}_{i,t} = \#\{j : h_{j,i,t} > 0, \, j \in \mathcal{J}\} \tag{38.4}\]

The change in breadth predicts future returns:

\[ \Delta\text{Breadth}_{i,t} = \text{Breadth}_{i,t} - \text{Breadth}_{i,t-1} \tag{38.5}\]

def compute_breadth(holdings: pd.DataFrame) -> pd.DataFrame:
    """Compute ownership breadth and changes by type."""
    breadth = (
        holdings[
            holdings['owner_type'].isin(OwnershipType.ALL_INSTITUTIONAL)
        ]
        .groupby(['ticker', 'rdate', 'owner_type'])['shareholder_name']
        .nunique()
        .reset_index()
        .rename(columns={'shareholder_name': 'n_holders'})
    )
    
    breadth_wide = breadth.pivot_table(
        index=['ticker', 'rdate'],
        columns='owner_type',
        values='n_holders',
        fill_value=0
    ).reset_index()
    
    breadth_wide.columns = [
        c if c in ['ticker', 'rdate']
        else f'n_{c.lower().replace(" ", "_")}'
        for c in breadth_wide.columns
    ]
    
    n_cols = [c for c in breadth_wide.columns if c.startswith('n_')]
    breadth_wide['n_total_inst'] = breadth_wide[n_cols].sum(axis=1)
    
    breadth_wide = breadth_wide.sort_values(['ticker', 'rdate'])
    for col in n_cols + ['n_total_inst']:
        breadth_wide[f'd_{col}'] = (
            breadth_wide.groupby('ticker')[col].diff()
        )
    
    return breadth_wide

(\(\text{BS} = -1\)) is generated for the prior position, dated to the quarter after the last report.

For intermediate gaps (reports at \(t-2\) and \(t\) but not \(t-1\)), we split into:

  • A terminating sale at \(t-1\) of \(-h_{j,i,t-2}^{\text{adj}}\);
  • An initiating buy at \(t\) of \(h_{j,i,t}\).

38.4 Implementation

def compute_trades(
    holdings: pd.DataFrame,
    adj_factors: pd.DataFrame,
) -> pd.DataFrame:
    """
    Compute institutional trades from holdings panel.
    
    Uses vectorized conditional logic (NOT apply()) for performance.
    
    Algorithm:
    1. Sort holdings by shareholder, ticker, quarter
    2. Compute lagged holdings and reporting gaps
    3. Apply modified trade logic based on first_report, gap
    4. Handle terminating sales and intermediate gaps
    5. Append all trade records
    """
    t1 = holdings.sort_values(
        ['shareholder_name', 'ticker', 'rdate']
    ).copy()
    
    # Previous holding quarter and shares
    grp = t1.groupby(['shareholder_name', 'ticker'])
    t1['phrdate'] = grp['rdate'].shift(1)
    t1['pshares_adj'] = grp['shares_adj'].shift(1)
    
    # Raw trade
    t1['trade'] = t1['shares_adj'] - t1['pshares_adj']
    
    # Quarter gap
    t1['qtrgap'] = t1.apply(
        lambda r: (
            (r['rdate'].to_period('Q')
             - r['phrdate'].to_period('Q')).n
            if pd.notna(r['phrdate']) else np.nan
        ),
        axis=1
    )
    
    # Boundary detection keys
    t1['l_key'] = (
        t1['shareholder_name'] + '_' + t1['ticker']
    ).shift(1)
    t1['n_key'] = (
        t1['shareholder_name'] + '_' + t1['ticker']
    ).shift(-1)
    t1['curr_key'] = t1['shareholder_name'] + '_' + t1['ticker']
    
    # ---- Vectorized trade classification ----
    is_new = (t1['curr_key'] != t1['l_key'])
    not_first = ~t1['first_report']
    consec = (t1['qtrgap'] == 1)
    gap = (t1['qtrgap'] != 1) & t1['qtrgap'].notna()
    
    cond1   = is_new
    cond1_1 = is_new & not_first
    cond2_1 = (~is_new) & not_first & consec
    cond2_2 = (~is_new) & not_first & gap
    
    # Modified trade amounts
    t1['modtrade'] = t1['trade']
    t1.loc[cond1, 'modtrade'] = np.nan
    t1.loc[cond1_1, 'modtrade'] = t1.loc[cond1_1, 'shares_adj']
    t1.loc[cond2_1, 'modtrade'] = t1.loc[cond2_1, 'trade']
    t1.loc[cond2_2, 'modtrade'] = t1.loc[cond2_2, 'shares_adj']
    
    # Buy/sale classification
    t1['buysale'] = np.nan
    t1.loc[cond1_1, 'buysale'] = 1
    t1.loc[cond2_1, 'buysale'] = (
        2 * np.sign(t1.loc[cond2_1, 'trade'])
    )
    t1.loc[cond2_2, 'buysale'] = 1.5  # placeholder for split
    
    # ---- Handle intermediate gaps (buysale == 1.5) ----
    t2 = t1[t1['buysale'] == 1.5].copy()
    t2['rdate'] = t2['phrdate'] + pd.offsets.QuarterEnd(1)
    t2['buysale'] = -1
    t2['modtrade'] = -t2['pshares_adj']
    
    t1.loc[t1['buysale'] == 1.5, 'buysale'] = 1
    
    # ---- Terminating sales ----
    is_last_combo = (t1['curr_key'] != t1['n_key'])
    not_last_rpt = ~t1['last_report']
    
    t3 = t1[is_last_combo & not_last_rpt].copy()
    t3['rdate'] = t3['rdate'] + pd.offsets.QuarterEnd(1)
    t3['modtrade'] = -t3['shares_adj']
    t3['buysale'] = -1
    
    # ---- Combine ----
    trades = pd.concat([t1, t2, t3], ignore_index=True)
    trades = trades[
        (trades['modtrade'] != 0) &
        trades['modtrade'].notna() &
        trades['buysale'].notna()
    ].copy()
    
    trades = trades[[
        'rdate', 'shareholder_name', 'ticker', 'modtrade',
        'buysale', 'owner_type', 'first_report', 'last_report'
    ]].rename(columns={'modtrade': 'trade'})
    
    print(f"\nTrade computation complete:")
    print(f"  Total records: {len(trades):,}")
    print(f"  Initiating buys:  {(trades['buysale'] == 1).sum():,}")
    print(f"  Incremental buys: {(trades['buysale'] == 2).sum():,}")
    print(f"  Terminating sales:{(trades['buysale'] == -1).sum():,}")
    print(f"  Regular sales:    {(trades['buysale'] == -2).sum():,}")
    
    return trades

38.4.1 Trade Visualization

Code
def plot_trade_distribution(trades: pd.DataFrame):
    """Plot time series of trade types by quarter."""
    bs_labels = {
        1: 'Initiating Buy', 2: 'Incremental Buy',
        -1: 'Terminating Sale', -2: 'Regular Sale'
    }
    trades = trades.copy()
    trades['trade_type'] = trades['buysale'].map(bs_labels)
    
    counts = (
        trades
        .groupby([pd.Grouper(key='rdate', freq='QE'), 'trade_type'])
        .size()
        .unstack(fill_value=0)
    )
    
    fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)
    
    buy_cols = [c for c in counts.columns if 'Buy' in c]
    counts[buy_cols].plot(
        kind='bar', stacked=True, ax=axes[0],
        color=['#1f77b4', '#aec7e8'], width=0.8
    )
    axes[0].set_title('Panel A: Institutional Purchases', fontweight='bold')
    axes[0].set_ylabel('Number of Trades')
    
    sale_cols = [c for c in counts.columns if 'Sale' in c]
    counts[sale_cols].plot(
        kind='bar', stacked=True, ax=axes[1],
        color=['#d62728', '#ff9896'], width=0.8
    )
    axes[1].set_title('Panel B: Institutional Sales', fontweight='bold')
    axes[1].set_ylabel('Number of Trades')
    
    for ax in axes:
        ax.tick_params(axis='x', rotation=45)
        for i, label in enumerate(ax.get_xticklabels()):
            if i % 4 != 0:
                label.set_visible(False)
    
    plt.tight_layout()
    plt.show()

# plot_trade_distribution(trades)
Figure 38.1
Code
def plot_net_trading_by_type(trades: pd.DataFrame, price_q: pd.DataFrame):
    """Plot net trading volume by owner type over time."""
    _t = trades.merge(
        price_q[['ticker', 'qdate', 'p']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    _t['trade_vnd'] = _t['trade'] * _t['p'] / 1e9
    
    net = (
        _t
        .groupby([pd.Grouper(key='rdate', freq='QE'), 'owner_type'])
        ['trade_vnd'].sum()
        .unstack(fill_value=0)
    )
    
    fig, ax = plt.subplots(figsize=(12, 6))
    for col in net.columns:
        ax.plot(net.index, net[col], label=col,
                color=OWNER_COLORS.get(col, '#333'), linewidth=1.5)
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.set_title('Net Institutional Trading by Ownership Type',
                 fontweight='bold')
    ax.set_ylabel('Net Trading (Billions VND)')
    ax.legend(loc='best')
    plt.tight_layout()
    plt.show()

# plot_net_trading_by_type(trades, price_q)
Figure 38.2

39 Portfolio Assets, Flows, and Returns

This section computes total portfolio assets, aggregates buys and sales, and portfolio-level returns for each institutional investor.

39.1 Total Assets and Portfolio Returns

For each manager \(j\) and quarter \(t\), portfolio assets are:

\[ A_{j,t} = \sum_{i=1}^{N_{j,t}} h_{j,i,t} \cdot P_{i,t} \tag{39.1}\]

The portfolio return assuming buy-and-hold is:

\[ R_{j,t}^{p} = \frac{\sum_{i=1}^{N_{j,t}} h_{j,i,t} \cdot P_{i,t} \cdot r_{i,t+1}} {\sum_{i=1}^{N_{j,t}} h_{j,i,t} \cdot P_{i,t}} \tag{39.2}\]

def compute_assets_and_returns(
    holdings: pd.DataFrame,
    price_q: pd.DataFrame,
) -> pd.DataFrame:
    """Compute total portfolio assets and buy-and-hold returns."""
    _assets = holdings[
        ['shareholder_name', 'ticker', 'rdate', 'shares_adj']
    ].merge(
        price_q[['ticker', 'qdate', 'p', 'qret']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    
    _assets['hold_per_stock'] = _assets['shares_adj'] * _assets['p'] / 1e6
    _assets['next_value'] = (
        _assets['shares_adj'] * _assets['p'] * _assets['qret']
    )
    _assets['curr_value'] = _assets['shares_adj'] * _assets['p']
    
    assets = (
        _assets
        .groupby(['shareholder_name', 'rdate'])
        .agg(
            assets=('hold_per_stock', 'sum'),
            total_next=('next_value', 'sum'),
            total_curr=('curr_value', 'sum'),
        )
        .reset_index()
    )
    
    assets['pret'] = assets['total_next'] / assets['total_curr']
    assets = assets.drop(columns=['total_next', 'total_curr'])
    return assets

39.2 Aggregate Buys and Sales

Total buys and sales for manager \(j\) in quarter \(t\):

\[ B_{j,t} = \sum_{i : \Delta h > 0} \Delta h_{j,i,t} \cdot P_{i,t}, \qquad S_{j,t} = \sum_{i : \Delta h < 0} |\Delta h_{j,i,t}| \cdot P_{i,t} \tag{39.3}\]

The trade gain is:

\[ G_{j,t} = \sum_{i=1}^{N_{j,t}} \Delta h_{j,i,t} \cdot P_{i,t} \cdot r_{i,t+1} \tag{39.4}\]

def compute_buys_sales(
    trades: pd.DataFrame,
    price_q: pd.DataFrame,
) -> pd.DataFrame:
    """Compute aggregate buys, sales, trade gains per manager-quarter."""
    _flows = trades.merge(
        price_q[['ticker', 'qdate', 'p', 'qret']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    
    _flows['tbuys'] = (
        _flows['trade'] * (_flows['trade'] > 0).astype(float)
        * _flows['p'] / 1e6
    )
    _flows['tsales'] = (
        (-1) * _flows['trade'] * (_flows['trade'] < 0).astype(float)
        * _flows['p'] / 1e6
    )
    _flows['tgain'] = (
        _flows['trade'] * _flows['p'] * _flows['qret'] / 1e6
    )
    
    flows = (
        _flows
        .groupby(['shareholder_name', 'rdate'])
        .agg(
            tbuys=('tbuys', 'sum'),
            tsales=('tsales', 'sum'),
            tgain=('tgain', 'sum'),
        )
        .reset_index()
    )
    return flows

40 Net Flows and Turnover Ratios

40.1 Net Flows

Net flows separate capital allocation decisions from investment returns:

\[ \text{NetFlow}_{j,t} = A_{j,t} - A_{j,t-1}(1 + R_{j,t}^p) \tag{40.1}\]

WarningInterpreting Net Flows in Vietnam

For state entities or corporate cross-holders, “net flows” do not necessarily reflect investment decisions. State ownership changes often result from government policy (equitization, divestment programs). Interpretation should account for institutional context.

40.2 Three Turnover Measures

def compute_aggregates(
    holdings: pd.DataFrame,
    assets: pd.DataFrame,
    flows: pd.DataFrame,
) -> pd.DataFrame:
    """
    Compute net flows and three turnover measures.
    
    1. Carhart (1997): min(buys, sales) / avg(assets)
    2. Flow-adjusted: [min(buys, sales) + |net flows|] / lag assets
    3. Symmetric: [buys + sales - |net flows|] / lag assets
    """
    report_flags = (
        holdings
        .groupby(['shareholder_name', 'rdate'])
        .agg(first_report=('first_report', 'any'),
             last_report=('last_report', 'any'))
        .reset_index()
    )
    
    agg = report_flags.merge(
        assets, on=['shareholder_name', 'rdate'], how='inner'
    )
    agg = agg.merge(
        flows, on=['shareholder_name', 'rdate'], how='left'
    )
    
    agg = agg.sort_values(['shareholder_name', 'rdate'])
    
    agg['assets_comp'] = agg['assets'] * (1 + agg['pret'].fillna(0))
    
    grp = agg.groupby('shareholder_name')
    agg['lassets_comp'] = grp['assets_comp'].shift(1)
    agg['lassets'] = grp['assets'].shift(1)
    
    # Trade gain return
    agg['tgainret'] = agg['tgain'] / (agg['tbuys'] + agg['tsales'])
    
    # Net flows
    agg['netflows'] = agg['assets'] - agg['lassets_comp']
    
    # Turnover 1: Carhart (1997)
    agg['turnover1'] = (
        agg[['tbuys', 'tsales']].min(axis=1) /
        agg[['assets', 'lassets']].mean(axis=1)
    )
    
    # Turnover 2: Flow-adjusted
    agg['turnover2'] = (
        (agg[['tbuys', 'tsales']].min(axis=1)
         + agg['netflows'].abs().fillna(0))
        / agg['lassets']
    )
    
    # Turnover 3: Symmetric
    agg['turnover3'] = (
        (agg['tbuys'].fillna(0) + agg['tsales'].fillna(0)
         - agg['netflows'].abs().fillna(0))
        / agg['lassets']
    )
    
    # Missing for first report
    first_mask = agg['first_report']
    for col in ['netflows', 'tgainret',
                'turnover1', 'turnover2', 'turnover3']:
        agg.loc[first_mask, col] = np.nan
    
    agg = agg.drop(columns=['assets_comp', 'lassets_comp', 'lassets'])
    
    print(f"\nAggregates: {len(agg):,} manager-quarters")
    print(f"  Turnover1 mean: {agg['turnover1'].mean():.4f}")
    print(f"  Turnover2 mean: {agg['turnover2'].mean():.4f}")
    print(f"  Turnover3 mean: {agg['turnover3'].mean():.4f}")
    
    return agg

40.2.1 Turnover Summary Statistics

Table 40.1: Summary statistics for three turnover measures across institutional investor types in Vietnam. Turnover 1 follows Carhart (1997), Turnover 2 adds back absolute net flows, and Turnover 3 uses the symmetric definition.
Code
def turnover_summary_table(
    aggregates: pd.DataFrame,
    holdings: pd.DataFrame,
) -> pd.DataFrame:
    """Publication-quality turnover summary statistics table."""
    owner_map = (
        holdings.groupby('shareholder_name')['owner_type']
        .first().reset_index()
    )
    agg = aggregates.merge(owner_map, on='shareholder_name', how='left')
    
    turnover_cols = ['turnover1', 'turnover2', 'turnover3']
    results = []
    
    for otype in ['All'] + OwnershipType.ALL_TYPES:
        subset = agg if otype == 'All' else agg[agg['owner_type'] == otype]
        row = {'Owner Type': otype, 'N': len(subset)}
        for col in turnover_cols:
            s = subset[col].dropna()
            row[f'{col}_mean'] = s.mean()
            row[f'{col}_median'] = s.median()
            row[f'{col}_std'] = s.std()
        results.append(row)
    
    return pd.DataFrame(results).round(4)

# turnover_summary_table(aggregates, holdings)
Code
def plot_turnover_timeseries(
    aggregates: pd.DataFrame, holdings: pd.DataFrame
):
    """Plot turnover time series by ownership type."""
    owner_map = (
        holdings.groupby('shareholder_name')['owner_type']
        .first().reset_index()
    )
    agg = aggregates.merge(owner_map, on='shareholder_name', how='left')
    
    fig, ax = plt.subplots(figsize=(12, 6))
    for otype in OwnershipType.ALL_INSTITUTIONAL:
        subset = agg[agg['owner_type'] == otype]
        qtr_mean = (
            subset
            .groupby(pd.Grouper(key='rdate', freq='QE'))['turnover1']
            .mean()
        )
        ax.plot(qtr_mean.index, qtr_mean.values, label=otype,
                color=OWNER_COLORS.get(otype, '#333'), linewidth=1.5)
    
    ax.set_title('Quarterly Average Turnover (Carhart)',
                 fontweight='bold')
    ax.set_ylabel('Turnover Ratio')
    ax.legend(loc='best')
    ax.yaxis.set_major_formatter(mticker.PercentFormatter(1.0))
    plt.tight_layout()
    plt.show()

# plot_turnover_timeseries(aggregates, holdings)
Figure 40.1

41 Foreign Ownership Analytics

Vietnam’s foreign ownership limits create unique analytical dimensions absent from developed market studies.

41.1 FOL Utilization

\[ \text{FOL\_Util}_{i,t} = \frac{FO_{i,t}}{FOL_i} \tag{41.1}\]

Stocks with \(\text{FOL\_Util}_{i,t} \to 1\) face mechanical foreign buying restrictions.

def compute_fol_analytics(
    foreign_ownership: pd.DataFrame,
    company_profile: pd.DataFrame,
) -> pd.DataFrame:
    """Compute FOL utilization and related metrics."""
    fo = foreign_ownership.copy()
    fo = fo.merge(
        company_profile[['ticker', 'fol_limit']].drop_duplicates(),
        on='ticker', how='left'
    )
    
    fo['fol_utilization'] = fo['foreign_pct'] / fo['fol_limit']
    fo['foreign_room'] = fo['fol_limit'] - fo['foreign_pct']
    fo['fol_binding'] = (fo['fol_utilization'] >= 0.98)
    fo['fol_category'] = pd.cut(
        fo['fol_utilization'],
        bins=[0, 0.25, 0.50, 0.75, 0.95, 1.0, float('inf')],
        labels=['<25%', '25-50%', '50-75%', '75-95%',
                '95-100%', '>100%']
    )
    return fo

41.2 Room Premium Regression

When foreign ownership approaches the FOL, remaining “room” becomes scarce. We model:

\[ r_{i,t+1} = \alpha + \beta_1 \cdot \text{FOL\_Util}_{i,t} + \beta_2 \cdot \text{FOL\_Util}_{i,t}^2 + \gamma \cdot X_{i,t} + \varepsilon_{i,t} \tag{41.2}\]

The quadratic term captures nonlinear acceleration of the premium as ownership approaches the limit.

def estimate_room_premium(
    fol_analytics: pd.DataFrame,
    price_q: pd.DataFrame,
) -> dict:
    """Estimate foreign ownership room premium via panel regression."""
    fol_q = (
        fol_analytics
        .assign(qdate=lambda x: x['date'] + pd.offsets.QuarterEnd(0))
        .groupby(['ticker', 'qdate'])
        .agg(fol_utilization=('fol_utilization', 'last'),
             foreign_room=('foreign_room', 'last'))
        .reset_index()
    )
    
    panel = fol_q.merge(
        price_q[['ticker', 'qdate', 'mcap', 'qret']],
        on=['ticker', 'qdate'], how='inner'
    )
    
    panel['log_mcap'] = np.log(panel['mcap'] + 1)
    panel['fol_util_sq'] = panel['fol_utilization'] ** 2
    panel = panel.dropna(subset=['qret', 'fol_utilization', 'log_mcap'])
    
    X = panel[['fol_utilization', 'fol_util_sq', 'log_mcap']]
    X = sm.add_constant(X)
    y = panel['qret']
    
    model = sm.OLS(y, X).fit(
        cov_type='cluster', cov_kwds={'groups': panel['ticker']}
    )
    return {'model': model, 'n_obs': len(panel)}

# results = estimate_room_premium(fol_analytics, price_q)
Code
def plot_fol_utilization(fol_analytics: pd.DataFrame):
    """Plot FOL utilization distribution."""
    latest = (
        fol_analytics.sort_values(['ticker', 'date'])
        .groupby('ticker').last().reset_index()
    )
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    axes[0].hist(latest['fol_utilization'].dropna(), bins=50,
                 color='#1f77b4', alpha=0.7, edgecolor='white')
    axes[0].axvline(x=0.95, color='red', linestyle='--',
                     label='95% threshold')
    axes[0].set_title('Panel A: FOL Utilization Distribution',
                       fontweight='bold')
    axes[0].set_xlabel('FOL Utilization Ratio')
    axes[0].set_ylabel('Number of Stocks')
    axes[0].legend()
    
    for exch in ['HOSE', 'HNX', 'UPCOM']:
        sub = latest[latest.get('exchange') == exch]
        if len(sub) > 0:
            axes[1].hist(sub['fol_utilization'].dropna(), bins=30,
                        alpha=0.5, label=exch,
                        color=EXCHANGE_COLORS.get(exch, '#333'))
    axes[1].set_title('Panel B: By Exchange', fontweight='bold')
    axes[1].set_xlabel('FOL Utilization Ratio')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()

# plot_fol_utilization(fol_analytics)
Figure 41.1

42 Complete Pipeline

We integrate all steps into a single end-to-end function:

def run_complete_pipeline(
    dc: 'DataCoreReader',
    begdate: str = '2010-01-01',
    enddate: str = '2024-12-31',
) -> Dict[str, pd.DataFrame]:
    """
    Execute the complete institutional ownership analytics pipeline.
    
    Steps:
    1. Build corporate action adjustment factors
    2. Process stock prices
    3. Construct holdings panel (Steps 2-4)
    4. Compute IO metrics
    5. Compute institutional trades (Step 5)
    6. Compute portfolio assets and returns (Step 6a)
    7. Compute aggregate buys, sales, trade gains (Step 6b)
    8. Compute net flows and turnover (Step 7)
    9. Compute foreign ownership analytics
    
    Returns dict of all output DataFrames.
    """
    print("=" * 60)
    print("INSTITUTIONAL TRADES, FLOWS, AND TURNOVER PIPELINE")
    print(f"Sample: {begdate} to {enddate}")
    print("=" * 60)
    
    print("\n[1/9] Building adjustment factors...")
    adj_factors = build_adjustment_factors(dc.corporate_actions)
    
    print("\n[2/9] Processing stock prices...")
    price_q, qret = process_prices(
        dc.prices, adj_factors, begdate, enddate
    )
    
    print("\n[3/9] Building holdings panel...")
    holdings = build_holdings_panel(
        dc.ownership, adj_factors, price_q,
        dc.company_profile, begdate, enddate
    )
    
    print("\n[4/9] Computing ownership metrics...")
    io_ratios = compute_io_ratios(holdings, price_q)
    hhi = compute_hhi(holdings)
    breadth = compute_breadth(holdings)
    
    print("\n[5/9] Computing institutional trades...")
    trades = compute_trades(holdings, adj_factors)
    
    print("\n[6/9] Computing portfolio assets...")
    assets = compute_assets_and_returns(holdings, price_q)
    
    print("\n[7/9] Computing aggregate buys and sales...")
    flows = compute_buys_sales(trades, price_q)
    
    print("\n[8/9] Computing net flows and turnover...")
    aggregates = compute_aggregates(holdings, assets, flows)
    
    print("\n[9/9] Computing foreign ownership analytics...")
    fol_analytics = compute_fol_analytics(
        dc.foreign_ownership, dc.company_profile
    )
    
    print("\n" + "=" * 60)
    print("PIPELINE COMPLETE")
    print("=" * 60)
    
    return {
        'price_q': price_q, 'holdings': holdings,
        'io_ratios': io_ratios, 'hhi': hhi,
        'breadth': breadth, 'trades': trades,
        'assets': assets, 'flows': flows,
        'aggregates': aggregates, 'fol_analytics': fol_analytics,
    }

# dc = DataCoreReader('/path/to/datacore_data', file_format='parquet')
# results = run_complete_pipeline(dc, '2010-01-01', '2024-12-31')

43 Advanced Extensions

43.1 Herding Measures

Following Sias (2004), the Lakonishok-Shleifer-Vishny herding measure is:

\[ HM_{i,t} = \left|\frac{B_{i,t}}{B_{i,t} + S_{i,t}} - p_t\right| - E\left[\left|\frac{B_{i,t}}{B_{i,t} + S_{i,t}} - p_t\right|\right] \tag{43.1}\]

where \(B_{i,t}\) is the number of managers buying stock \(i\) in quarter \(t\), \(S_{i,t}\) the number selling, and \(p_t\) the expected buyer proportion under independent trading.

def compute_lsv_herding(
    trades: pd.DataFrame,
    min_traders: int = 5,
) -> pd.DataFrame:
    """Compute LSV herding measure for each stock-quarter."""
    tc = (
        trades.groupby(['ticker', 'rdate'])
        .apply(lambda g: pd.Series({
            'n_buyers': (g['trade'] > 0).sum(),
            'n_sellers': (g['trade'] < 0).sum(),
            'n_traders': len(g),
        }))
        .reset_index()
    )
    
    tc = tc[tc['n_traders'] >= min_traders].copy()
    tc['buy_prop'] = tc['n_buyers'] / tc['n_traders']
    tc['p_t'] = tc.groupby('rdate')['buy_prop'].transform('mean')
    tc['raw_hm'] = (tc['buy_prop'] - tc['p_t']).abs()
    
    def expected_abs_deviation(row):
        n = int(row['n_traders'])
        p = row['p_t']
        if n == 0 or p == 0 or p == 1:
            return 0
        from scipy.stats import binom
        k = np.arange(0, n + 1)
        probs = binom.pmf(k, n, p)
        return np.sum(np.abs(k / n - p) * probs)
    
    tc['expected_hm'] = tc.apply(expected_abs_deviation, axis=1)
    tc['herding'] = tc['raw_hm'] - tc['expected_hm']
    
    tc['buy_herding'] = np.where(
        tc['buy_prop'] > tc['p_t'], tc['herding'], np.nan
    )
    tc['sell_herding'] = np.where(
        tc['buy_prop'] < tc['p_t'], tc['herding'], np.nan
    )
    
    return tc[['ticker', 'rdate', 'n_buyers', 'n_sellers',
               'n_traders', 'herding', 'buy_herding', 'sell_herding']]

43.2 Demand Persistence

Sias (2004) showed institutional demand is persistent:

\[ \rho_t = \text{Corr}\left(\Delta IO_{i,t},\, \Delta IO_{i,t-1}\right) \tag{43.2}\]

def compute_demand_persistence(io_ratios: pd.DataFrame) -> pd.DataFrame:
    """Rolling cross-sectional correlation of IO changes."""
    io = io_ratios[['ticker', 'rdate', 'io_total_inst']].copy()
    io = io.sort_values(['ticker', 'rdate'])
    io['dio'] = io.groupby('ticker')['io_total_inst'].diff()
    io['lag_dio'] = io.groupby('ticker')['dio'].shift(1)
    
    persistence = (
        io.dropna(subset=['dio', 'lag_dio'])
        .groupby('rdate')
        .apply(lambda g: g['dio'].corr(g['lag_dio']))
        .reset_index()
        .rename(columns={0: 'persistence'})
    )
    persistence = persistence.sort_values('rdate')
    persistence['persistence_ma'] = (
        persistence['persistence'].rolling(window=20, min_periods=4).mean()
    )
    return persistence
Code
def plot_demand_persistence(persistence: pd.DataFrame):
    fig, ax = plt.subplots(figsize=(12, 5))
    ax.bar(persistence['rdate'], persistence['persistence'],
           width=80, alpha=0.3, color='#1f77b4', label='Quarterly')
    ax.plot(persistence['rdate'], persistence['persistence_ma'],
            color='#d62728', linewidth=2, label='Rolling Average')
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.set_title('Persistence of Institutional Demand', fontweight='bold')
    ax.set_ylabel('Cross-Sectional Correlation')
    ax.legend()
    plt.tight_layout()
    plt.show()
Figure 43.1

43.3 Information Content of Trades

Following Alexander, Cici, and Gibson (2007), the InfoTrade ratio measures the proportion of dollar trading from entry/exit decisions vs. position adjustments:

\[ \text{InfoTrade}_{i,t} = \frac{ \sum_{j: BS \in \{+1,-1\}} |\Delta h_{j,i,t}| \cdot P_{i,t} }{ \sum_j |\Delta h_{j,i,t}| \cdot P_{i,t} } \tag{43.3}\]

def compute_info_trade_ratio(
    trades: pd.DataFrame, price_q: pd.DataFrame
) -> pd.DataFrame:
    """Compute info trade ratio for each stock-quarter."""
    _t = trades.merge(
        price_q[['ticker', 'qdate', 'p']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    _t['dollar_trade'] = _t['trade'].abs() * _t['p'] / 1e6
    _t['is_discrete'] = _t['buysale'].isin([1, -1])
    
    info = _t.groupby(['ticker', 'rdate']).apply(
        lambda g: pd.Series({
            'discrete_vol': g.loc[g['is_discrete'], 'dollar_trade'].sum(),
            'total_vol': g['dollar_trade'].sum(),
        })
    ).reset_index()
    
    info['info_trade_ratio'] = (
        info['discrete_vol'] / info['total_vol']
    ).clip(0, 1)
    return info

44 Empirical Applications

44.1 Application 1: Institutional Ownership Changes and Future Returns

We test whether changes in institutional ownership predict future stock returns (Chen, Jegadeesh, and Wermers 2000) via Fama-MacBeth regressions:

\[ r_{i,t+1} = \alpha_t + \beta_{1,t} \cdot \Delta IO_{i,t} + \beta_{2,t} \cdot \Delta\text{Breadth}_{i,t} + \gamma_t \cdot X_{i,t} + \varepsilon_{i,t} \tag{44.1}\]

def fama_macbeth_io_returns(
    io_ratios: pd.DataFrame,
    breadth: pd.DataFrame,
    price_q: pd.DataFrame,
) -> pd.DataFrame:
    """Run Fama-MacBeth regressions of future returns on IO changes."""
    panel = io_ratios[['ticker', 'rdate', 'io_total_inst']].merge(
        breadth[['ticker', 'rdate', 'n_total_inst', 'd_n_total_inst']],
        on=['ticker', 'rdate'], how='inner'
    ).merge(
        price_q[['ticker', 'qdate', 'mcap', 'qret']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    
    panel = panel.sort_values(['ticker', 'rdate'])
    panel['dio'] = panel.groupby('ticker')['io_total_inst'].diff()
    panel['log_mcap'] = np.log(panel['mcap'] + 1)
    panel['mom'] = panel.groupby('ticker')['qret'].shift(1)
    
    reg_vars = ['qret', 'dio', 'd_n_total_inst', 'log_mcap', 'mom']
    panel = panel.dropna(subset=reg_vars)
    
    quarters = sorted(panel['rdate'].unique())
    results = []
    
    for q in quarters:
        qdata = panel[panel['rdate'] == q]
        if len(qdata) < 30:
            continue
        X = sm.add_constant(
            qdata[['dio', 'd_n_total_inst', 'log_mcap', 'mom']]
        )
        try:
            model = sm.OLS(qdata['qret'], X).fit()
            coefs = model.params.to_dict()
            coefs['rdate'] = q
            coefs['n_obs'] = len(qdata)
            results.append(coefs)
        except Exception:
            continue
    
    fm = pd.DataFrame(results)
    
    # Time-series averages with Newey-West t-statistics
    print("\nFama-MacBeth Results:")
    print("=" * 50)
    for var in ['const', 'dio', 'd_n_total_inst', 'log_mcap', 'mom']:
        coefs = fm[var].dropna()
        mean_c = coefs.mean()
        nw_se = sm.OLS(
            coefs - mean_c, np.ones(len(coefs))
        ).fit(cov_type='HAC', cov_kwds={'maxlags': 4}).bse[0]
        t = mean_c / nw_se if nw_se > 0 else np.nan
        print(f"  {var:20s}: coef={mean_c:8.4f}, t={t:6.2f}")
    
    return fm

44.2 Application 2: Turnover and Performance

Yan (2008) documented a positive turnover-performance relationship. We test in Vietnam:

\[ \alpha_{j,t} = a + b \cdot \text{Turnover}_{j,t-1} + c \cdot \log(A_{j,t-1}) + d \cdot \text{Flow}_{j,t} + \varepsilon_{j,t} \tag{44.2}\]

def turnover_performance_regression(
    aggregates: pd.DataFrame,
) -> dict:
    """Test turnover-performance relationship."""
    agg = aggregates.sort_values(['shareholder_name', 'rdate']).copy()
    agg['lag_turnover1'] = (
        agg.groupby('shareholder_name')['turnover1'].shift(1)
    )
    agg['log_assets'] = np.log(agg['assets'] + 1)
    agg['flow_ratio'] = agg['netflows'] / agg['assets'].shift(1)
    
    panel = agg.dropna(
        subset=['pret', 'lag_turnover1', 'log_assets', 'flow_ratio']
    )
    
    for col in ['pret', 'lag_turnover1', 'flow_ratio']:
        lo, hi = panel[col].quantile([0.01, 0.99])
        panel[col] = panel[col].clip(lo, hi)
    
    X = sm.add_constant(
        panel[['lag_turnover1', 'log_assets', 'flow_ratio']]
    )
    model = sm.OLS(panel['pret'], X).fit(
        cov_type='cluster',
        cov_kwds={'groups': panel['shareholder_name']}
    )
    return {'model': model, 'n': len(panel)}

44.3 Application 3: Foreign vs. Domestic Trading

def compare_foreign_domestic(
    trades: pd.DataFrame, price_q: pd.DataFrame,
) -> pd.DataFrame:
    """Compare trading patterns between foreign and domestic institutions."""
    _t = trades.merge(
        price_q[['ticker', 'qdate', 'p']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    _t['dollar_trade'] = _t['trade'] * _t['p'] / 1e6
    _t['is_buy'] = _t['trade'] > 0
    
    return (
        _t[_t['owner_type'].isin(OwnershipType.INSTITUTIONAL)]
        .groupby('owner_type')
        .agg(
            n_trades=('trade', 'count'),
            n_buys=('is_buy', 'sum'),
            avg_dollar=('dollar_trade', lambda x: x.abs().mean()),
            net_buying=('dollar_trade', 'sum'),
            pct_initiating=('buysale', lambda x: (x.abs() == 1).mean()),
        )
        .reset_index()
    )
Code
def plot_cumulative_net_buying(
    trades: pd.DataFrame, price_q: pd.DataFrame
):
    _t = trades.merge(
        price_q[['ticker', 'qdate', 'p']],
        left_on=['ticker', 'rdate'],
        right_on=['ticker', 'qdate'],
        how='inner'
    )
    _t['trade_vnd'] = _t['trade'] * _t['p'] / 1e9
    
    inst = _t[_t['owner_type'].isin(OwnershipType.INSTITUTIONAL)]
    net = (
        inst.groupby(
            [pd.Grouper(key='rdate', freq='QE'), 'owner_type']
        )['trade_vnd'].sum().unstack(fill_value=0)
    )
    cum = net.cumsum()
    
    fig, ax = plt.subplots(figsize=(12, 6))
    for col in cum.columns:
        ax.plot(cum.index, cum[col], label=col,
                color=OWNER_COLORS.get(col, '#333'), linewidth=2)
    ax.axhline(y=0, color='black', linewidth=0.5)
    ax.set_title('Cumulative Net Institutional Buying', fontweight='bold')
    ax.set_ylabel('Billions VND')
    ax.legend(loc='best')
    plt.tight_layout()
    plt.show()
Figure 44.1

45 Data Quality and Robustness

45.1 Common Pitfalls

45.1.1 Corporate Action Misadjustment

CautionExample: Phantom Trade from Unadjusted Stock Dividend

Vinamilk (VNM) issues a 20% stock dividend with ex-date March 15, 2023.

  • Q4 2022: Fund X holds 1,000,000 shares of VNM
  • Q1 2023: Fund X holds 1,200,000 shares of VNM

Without adjustment: Inferred buy of +200,000 shares (BS = +2) With adjustment: Prior holdings become 1,200,000 adjusted shares, trade = 0

This phantom trade inflates measured turnover and creates spurious buying signals.

45.1.2 Disclosure Timing Mismatches

Vietnamese ownership disclosure dates may not align with calendar quarter ends. Our pipeline addresses this by aligning all disclosures to the nearest quarter-end.

45.1.3 Name Changes and Entity Mergers

Vietnamese institutions frequently rename. Without a stable identifier, the same entity may appear as two different shareholders, creating phantom entries/exits. We recommend maintaining a master entity mapping table.

45.2 Validation Checks

def validate_pipeline_outputs(
    results: Dict[str, pd.DataFrame],
) -> pd.DataFrame:
    """Run comprehensive validation on pipeline outputs."""
    checks = []
    h = results['holdings']
    t = results['trades']
    a = results['aggregates']
    
    checks.append({
        'Check': 'No negative adjusted shares',
        'Result': 'PASS' if (h['shares_adj'] < 0).sum() == 0 else 'FAIL',
        'Detail': f'{(h["shares_adj"] < 0).sum()} negative obs'
    })
    
    checks.append({
        'Check': 'No duplicate holdings',
        'Result': 'PASS' if h.duplicated(
            subset=['shareholder_name', 'ticker', 'rdate']
        ).sum() == 0 else 'FAIL',
    })
    
    checks.append({
        'Check': 'Valid buysale codes only',
        'Result': 'PASS' if t['buysale'].isin([1, 2, -1, -2]).all()
        else 'FAIL',
    })
    
    checks.append({
        'Check': 'No zero trades',
        'Result': 'PASS' if (t['trade'] == 0).sum() == 0 else 'FAIL',
    })
    
    t1 = a['turnover1'].dropna()
    checks.append({
        'Check': 'Turnover1 in [0, 10]',
        'Result': 'PASS' if ((t1 < 0) | (t1 > 10)).sum() == 0
        else 'WARNING',
        'Detail': f'{((t1<0)|(t1>10)).sum()} extreme values'
    })
    
    first_rpt = a[a['first_report']]
    checks.append({
        'Check': 'First report -> missing netflows',
        'Result': 'PASS' if first_rpt['netflows'].isna().all()
        else 'FAIL',
    })
    
    return pd.DataFrame(checks)

# validate_pipeline_outputs(results)

46 Summary

This chapter developed a framework for computing institutional trades, flows, and turnover ratios in the Vietnamese equity market. The key contributions include:

  1. Corporate action adjustment for Vietnam’s frequent stock dividends and bonus shares, preventing phantom trades that contaminate standard differencing.

  2. Four-way ownership taxonomy (state, foreign institutional, domestic institutional, individual) capturing Vietnam’s unique ownership landscape.

  3. FOL utilization analytics for studying foreign ownership constraints absent from developed markets.

  4. Irregular disclosure handling with correct gap splitting into terminating sales and initiating buys.

  5. Advanced extensions including herding, demand persistence, and information content decomposition.

The pipeline produces several output datasets (Table 46.1)

Table 46.1: Summary of Pipeline Output Datasets
Output Grain Key Variables Use Cases
holdings Shareholder x Ticker x Quarter shares_adj, owner_type Cross-sectional ownership
io_ratios Ticker x Quarter io_state, io_foreign, etc. Governance, liquidity
trades Shareholder x Ticker x Quarter trade, buysale Informed trading, herding
aggregates Shareholder x Quarter assets, turnover, netflows Fund performance, flows
fol_analytics Ticker x Date fol_utilization, foreign_room FOL premium, foreign investment