class DataCoreReader:
"""
Unified data reader for DataCore.vn datasets.
Assumes data has been downloaded from DataCore.vn and stored locally.
Supports both Parquet (recommended for performance) and CSV formats.
Parameters
----------
data_dir : str or Path
Root directory containing DataCore.vn data files
file_format : str
'parquet' or 'csv' (default: 'parquet')
"""
# Expected file names in the data directory
FILE_MAP = {
'prices': 'stock_prices',
'ownership': 'ownership_structure',
'major_shareholders': 'major_shareholders',
'corporate_actions': 'corporate_actions',
'company_profile': 'company_profile',
'financials': 'financial_statements',
'foreign_ownership': 'foreign_ownership_daily',
'fund_holdings': 'fund_holdings',
}
def __init__(self, data_dir: Union[str, Path], file_format: str = 'parquet'):
self.data_dir = Path(data_dir)
self.fmt = file_format
self._cache = {}
# Verify data directory exists
if not self.data_dir.exists():
raise FileNotFoundError(
f"Data directory not found: {self.data_dir}\n"
f"Please download data from DataCore.vn and place it in this directory."
)
print(f"DataCore.vn reader initialized: {self.data_dir}")
available = [f.stem for f in self.data_dir.glob(f'*.{self.fmt}')]
print(f"Available datasets: {available}")
def _read(self, key: str) -> pd.DataFrame:
"""Read and cache a dataset."""
if key in self._cache:
return self._cache[key]
fname = self.FILE_MAP.get(key, key)
filepath = self.data_dir / f"{fname}.{self.fmt}"
if not filepath.exists():
raise FileNotFoundError(
f"Dataset not found: {filepath}\n"
f"Expected file: {fname}.{self.fmt} in {self.data_dir}"
)
if self.fmt == 'parquet':
df = pd.read_parquet(filepath)
else:
df = pd.read_csv(filepath, parse_dates=True)
# Auto-detect and parse date columns
for col in df.columns:
if 'date' in col.lower() or col.lower() in ['period', 'ex_date', 'record_date']:
try:
df[col] = pd.to_datetime(df[col])
except (ValueError, TypeError):
pass
self._cache[key] = df
print(f"Loaded {key}: {len(df):,} rows, {len(df.columns)} columns")
return df
@property
def prices(self) -> pd.DataFrame:
return self._read('prices')
@property
def ownership(self) -> pd.DataFrame:
return self._read('ownership')
@property
def major_shareholders(self) -> pd.DataFrame:
return self._read('major_shareholders')
@property
def corporate_actions(self) -> pd.DataFrame:
return self._read('corporate_actions')
@property
def company_profile(self) -> pd.DataFrame:
return self._read('company_profile')
@property
def financials(self) -> pd.DataFrame:
return self._read('financials')
@property
def foreign_ownership(self) -> pd.DataFrame:
return self._read('foreign_ownership')
@property
def fund_holdings(self) -> pd.DataFrame:
return self._read('fund_holdings')
def clear_cache(self):
"""Clear all cached datasets to free memory."""
self._cache.clear()
# Initialize reader — adjust path to your local DataCore.vn data
# dc = DataCoreReader('/path/to/datacore_data', file_format='parquet')33 Institutional Ownership Analytics in Vietnam
33.1 Institutional Ownership in Vietnam: A Distinct Landscape
Vietnam’s equity market presents a fundamentally different institutional ownership landscape from the mature markets of the US, Europe, or Japan. Since the Ho Chi Minh City Securities Trading Center (now HOSE) opened on July 28, 2000 with just two listed stocks, the market has grown to over 1,700 listed companies across three exchanges (HOSE, HNX, and UPCOM) with a combined market capitalization exceeding 200 billion USD. Yet the ownership structure remains distinctive in several critical ways:
Retail dominance. Individual investors account for approximately 85% of trading value on Vietnamese exchanges, far exceeding the institutional share. This contrasts sharply with the US, where institutional investors dominate both ownership and trading (Bao Dinh and Tran 2024). The implications for market efficiency, price discovery, and volatility are profound.
State ownership legacy. Vietnam’s equitization (privatization) program, initiated under Đổi Mới reforms in 1986, means that the state remains a significant or controlling shareholder in many listed companies. As of 2022, SOEs (firms with state ownership > 50%) account for approximately 30% of total market capitalization despite representing less than 10% of listed firms (Huang, Liu, and Shu 2023). State ownership introduces unique agency problems, governance dynamics, and liquidity constraints.
Foreign Ownership Limits (FOLs). Vietnam imposes sector-specific caps on aggregate foreign ownership, typically 49% for most sectors, 30% for banking, and varying limits for aviation, media, and telecommunications. When a stock reaches its FOL, foreign investors can only buy from other foreign sellers, creating a segmented market with distinct pricing dynamics and a well-documented “FOL premium” (Vo 2015).
Disclosure regime. Unlike the US quarterly 13F filing system, Vietnam’s ownership disclosure is event-driven and periodic. Major shareholders (≥5%) must disclose within 7 business days of crossing thresholds. Annual reports contain detailed shareholder registers. Semi-annual fund reports provide portfolio snapshots. This creates a patchwork of disclosure frequencies that require careful handling.
33.2 Data Infrastructure: DataCore.vn
DataCore.vn is a comprehensive Vietnamese financial data platform that provides academic-grade datasets for the Vietnamese market. Throughout this chapter, we assume all data is sourced exclusively from DataCore.vn, which provides:
| DataCore.vn Dataset | Content | Key Variables |
|---|---|---|
| Stock Prices | Daily/monthly OHLCV for HOSE, HNX, UPCOM | ticker, date, close, adjusted_close, volume, shares_outstanding |
| Ownership Structure | Shareholder composition snapshots | ticker, date, shareholder_name, shares_held, ownership_pct, shareholder_type |
| Major Shareholders | Detailed ≥5% holders | ticker, date, shareholder_name, shares_held, is_foreign, is_state, is_institution |
| Corporate Actions | Dividends, stock splits, bonus shares, rights issues | ticker, ex_date, action_type, ratio, record_date |
| Company Profile | Sector, exchange, listing date, charter capital | ticker, exchange, industry_code, listing_date, fol_limit |
| Financial Statements | Quarterly/annual financials | ticker, period, revenue, net_income, total_assets, equity |
| Foreign Ownership | Daily foreign ownership tracking | ticker, date, foreign_shares, foreign_pct, fol_limit, foreign_room |
| Fund Holdings | Semi-annual fund portfolio disclosures | fund_name, report_date, ticker, shares_held, market_value |
This chapter proceeds as follows. Section 33.3 builds the complete data pipeline from raw DataCore.vn extracts to clean, analysis-ready datasets, with particular attention to corporate action adjustments. Section 33.4 defines Vietnam’s unique ownership taxonomy. Section 33.5 computes institutional ownership ratios, concentration, and breadth for the Vietnamese market. Section 33.6 develops specialized foreign ownership analytics including FOL utilization and room premium. Section 33.7 derives institutional trades from ownership disclosure snapshots. Section 33.8 computes fund-level flows and turnover. Section 33.9 analyzes state ownership dynamics. Section 33.10 introduces network analysis, ML classification, and event-study frameworks. Section 33.11 presents complete empirical applications, and Section 33.12 concludes.
33.3 Data Pipeline
33.3.1 Stock Price Data and Corporate Action Adjustments
Vietnam’s equity market is notorious for frequent corporate actions, particularly stock dividends and bonus share issuances, that dramatically alter share counts. A company issuing a 30% stock dividend means every 100 shares become 130 shares, and the reference price adjusts downward proportionally. Failure to properly adjust historical shares and prices for these events is the single most common source of error in Vietnamese equity research.
# ============================================================================
# Step 1: Corporate Action Adjustment Factors
# ============================================================================
def build_adjustment_factors(corporate_actions: pd.DataFrame) -> pd.DataFrame:
"""
Build cumulative adjustment factors from the corporate actions history.
In Vietnam, the most common share-altering corporate actions are:
1. Stock dividends (cổ tức bằng cổ phiếu): e.g., 30% → ratio = 0.30
Effect: shares × (1 + 0.30), price × (1 / 1.30)
2. Bonus shares (thưởng cổ phiếu): mechanically identical to stock dividends
3. Stock splits (chia tách): e.g., 2:1 → ratio = 2.0
Effect: shares × 2, price × 0.5
4. Rights issues (phát hành thêm): dilutive, but not all shareholders exercise
We approximate with the subscription ratio
5. Reverse splits (gộp cổ phiếu): rare in Vietnam
Effect: shares ÷ ratio, price × ratio
We construct a FORWARD-LOOKING cumulative adjustment factor such that:
adjusted_shares = raw_shares × cum_adj_factor(from_date, to_date)
adjusted_price = raw_price / cum_adj_factor(from_date, to_date)
This is analogous to CRSP's cfacshr in the US context.
Parameters
----------
corporate_actions : pd.DataFrame
DataCore.vn corporate actions with columns:
ticker, ex_date, action_type, ratio
action_type values:
- 'stock_dividend': ratio = dividend rate (e.g., 0.30 for 30%)
- 'bonus_shares': ratio = bonus rate (e.g., 0.20 for 20%)
- 'stock_split': ratio = split factor (e.g., 2.0 for 2:1)
- 'reverse_split': ratio = merge factor (e.g., 5.0 for 5:1 merge)
- 'rights_issue': ratio = subscription rate (e.g., 0.10 for 10:1)
- 'cash_dividend': ratio = VND per share (no share adjustment needed)
Returns
-------
pd.DataFrame
Adjustment factors: ticker, ex_date, point_factor, cum_factor
"""
# Filter to share-altering events only
share_events = ['stock_dividend', 'bonus_shares', 'stock_split',
'reverse_split', 'rights_issue']
ca = corporate_actions[
corporate_actions['action_type'].isin(share_events)
].copy()
if len(ca) == 0:
print("No share-altering corporate actions found.")
return pd.DataFrame(columns=['ticker', 'ex_date', 'point_factor', 'cum_factor'])
# Compute point adjustment factor for each event
def compute_point_factor(row):
atype = row['action_type']
ratio = row['ratio']
if atype in ['stock_dividend', 'bonus_shares']:
# 30% stock dividend: 100 shares → 130 shares
return 1 + ratio
elif atype == 'stock_split':
# 2:1 split: 100 shares → 200 shares
return ratio
elif atype == 'reverse_split':
# 5:1 reverse: 500 shares → 100 shares
return 1.0 / ratio
elif atype == 'rights_issue':
# Approximate: assume all rights exercised
# In practice, this overestimates the adjustment
return 1 + ratio
else:
return 1.0
ca['point_factor'] = ca.apply(compute_point_factor, axis=1)
# Sort chronologically within each ticker
ca = ca.sort_values(['ticker', 'ex_date']).reset_index(drop=True)
# Cumulative factor: product of all point factors from listing to date
# This gives us a running "total adjustment" for each ticker
ca['cum_factor'] = ca.groupby('ticker')['point_factor'].cumprod()
# Summary statistics
n_tickers = ca['ticker'].nunique()
n_events = len(ca)
avg_events = n_events / n_tickers if n_tickers > 0 else 0
print(f"Corporate action adjustment factors built:")
print(f" Tickers with adjustments: {n_tickers:,}")
print(f" Total share-altering events: {n_events:,}")
print(f" Average events per ticker: {avg_events:.1f}")
print(f"\nEvent type distribution:")
print(ca['action_type'].value_counts().to_string())
return ca[['ticker', 'ex_date', 'action_type', 'ratio',
'point_factor', 'cum_factor']]
def adjust_shares(shares: float, ticker: str, from_date, to_date,
adj_factors: pd.DataFrame) -> float:
"""
Adjust a share count from one date to another for corporate actions.
Example: If a company had a 30% stock dividend with ex_date between
from_date and to_date, then 1000 shares at from_date = 1300 shares
at to_date.
Parameters
----------
shares : float
Number of shares at from_date
ticker : str
Stock ticker
from_date, to_date : pd.Timestamp
Period for adjustment
adj_factors : pd.DataFrame
Output of build_adjustment_factors()
Returns
-------
float
Adjusted shares at to_date
"""
events = adj_factors[
(adj_factors['ticker'] == ticker) &
(adj_factors['ex_date'] > pd.Timestamp(from_date)) &
(adj_factors['ex_date'] <= pd.Timestamp(to_date))
]
if len(events) == 0:
return shares
total_factor = events['point_factor'].prod()
return shares * total_factor
# Example usage:
# adj_factors = build_adjustment_factors(dc.corporate_actions)Vietnamese companies issue stock dividends with remarkable frequency, many growth companies do so 2-3 times per year. Consider Vinhomes (VHM) or FPT Corporation: their share counts may double or triple over a 5-year period purely from stock dividends. If you compare raw ownership shares from 2019 to 2024 without adjustment, you will obtain nonsensical ownership ratios. Every time-series analysis of Vietnamese ownership data must use adjusted shares. This is the Vietnamese equivalent of the CRSP cfacshr adjustment factor problem in US data, but more severe because the events are more frequent and larger in magnitude.
# ============================================================================
# Step 2: Process Stock Price Data
# ============================================================================
def process_price_data(prices: pd.DataFrame,
adj_factors: pd.DataFrame,
company_profile: pd.DataFrame) -> pd.DataFrame:
"""
Process DataCore.vn stock price data:
1. Align dates to month-end and quarter-end
2. Merge company metadata (exchange, sector, FOL limit)
3. Compute adjusted prices and shares outstanding
4. Compute market capitalization
5. Create quarter-end snapshots
Parameters
----------
prices : pd.DataFrame
Daily/monthly price data from DataCore.vn
adj_factors : pd.DataFrame
Corporate action adjustment factors
company_profile : pd.DataFrame
Company metadata including exchange, sector, FOL
Returns
-------
pd.DataFrame
Quarter-end processed stock data
"""
df = prices.copy()
# Standardize date
df['date'] = pd.to_datetime(df['date'])
df['month_end'] = df['date'] + pd.offsets.MonthEnd(0)
df['quarter_end'] = df['date'] + pd.offsets.QuarterEnd(0)
# Merge company profile
profile_cols = ['ticker', 'exchange', 'industry_code', 'fol_limit',
'listing_date', 'company_name']
profile_cols = [c for c in profile_cols if c in company_profile.columns]
df = df.merge(company_profile[profile_cols], on='ticker', how='left')
# Build cumulative adjustment factor for each ticker-date
# For each observation, compute the total adjustment from listing to that date
df = df.sort_values(['ticker', 'date'])
# Merge adjustment events
# For each ticker-date, find the cumulative factor as of that date
def get_cum_factor_at_date(group):
ticker = group.name
ticker_adj = adj_factors[adj_factors['ticker'] == ticker].copy()
if len(ticker_adj) == 0:
group['cum_adj_factor'] = 1.0
return group
# For each date, find cumulative factor (product of all events up to that date)
group = group.sort_values('date')
group['cum_adj_factor'] = 1.0
for _, event in ticker_adj.iterrows():
mask = group['date'] >= event['ex_date']
group.loc[mask, 'cum_adj_factor'] *= event['point_factor']
return group
df = df.groupby('ticker', group_keys=False).apply(get_cum_factor_at_date)
# Adjusted price and shares
# adjusted_close should already be provided by DataCore.vn
# But we compute our own for consistency
if 'adjusted_close' not in df.columns:
df['adjusted_close'] = df['close'] / df['cum_adj_factor']
# Adjusted shares outstanding
df['adjusted_shares'] = df['shares_outstanding'] * df['cum_adj_factor']
# Market capitalization (in billion VND)
df['market_cap'] = df['close'] * df['shares_outstanding'] / 1e9
# Monthly returns
df = df.sort_values(['ticker', 'date'])
df['ret'] = df.groupby('ticker')['adjusted_close'].pct_change()
# Keep quarter-end observations
# For daily data: keep last trading day of each quarter
df_quarterly = (df.sort_values(['ticker', 'quarter_end', 'date'])
.groupby(['ticker', 'quarter_end'])
.last()
.reset_index())
print(f"Processed price data:")
print(f" Total records (daily): {len(df):,}")
print(f" Quarter-end records: {len(df_quarterly):,}")
print(f" Unique tickers: {df_quarterly['ticker'].nunique():,}")
print(f" Date range: {df_quarterly['quarter_end'].min()} to "
f"{df_quarterly['quarter_end'].max()}")
print(f"\nExchange distribution:")
print(df_quarterly.groupby('exchange')['ticker'].nunique().to_string())
return df_quarterly
# prices_q = process_price_data(dc.prices, adj_factors, dc.company_profile)33.3.2 Ownership Structure Data
Vietnamese ownership data captures the composition of shareholders as disclosed in annual reports, semi-annual reports, and event-driven disclosures. The key distinction from US 13F data is that Vietnamese disclosures provide a complete ownership decomposition, not just institutional long positions, but the full breakdown into state, institutional, foreign, and individual ownership.
# ============================================================================
# Step 3: Process Ownership Structure Data
# ============================================================================
class OwnershipType:
"""
Vietnam's ownership taxonomy.
Unlike the US where 13F captures only institutional long positions,
Vietnamese disclosure provides a complete ownership decomposition.
We classify shareholders into five mutually exclusive categories.
"""
STATE = 'state' # Nhà nước (government entities, SOE parents)
FOREIGN_INST = 'foreign_inst' # Tổ chức nước ngoài
DOMESTIC_INST = 'domestic_inst' # Tổ chức trong nước (non-state)
INDIVIDUAL = 'individual' # Cá nhân
TREASURY = 'treasury' # Cổ phiếu quỹ
ALL_TYPES = [STATE, FOREIGN_INST, DOMESTIC_INST, INDIVIDUAL, TREASURY]
INSTITUTIONAL = [STATE, FOREIGN_INST, DOMESTIC_INST]
FOREIGN = [FOREIGN_INST] # Can be expanded if foreign individuals are tracked
def classify_shareholders(ownership: pd.DataFrame) -> pd.DataFrame:
"""
Classify shareholders into Vietnam's ownership taxonomy.
DataCore.vn may provide a `shareholder_type` field, but naming
conventions vary. This function standardizes the classification
using a combination of provided flags and name-based heuristics.
The classification challenge in Vietnam (noted by @huang2023factors):
DataCore.vn may not always cleanly separate institution types, so we
use a cascading approach:
1. Use explicit flags (is_state, is_foreign, is_institution) if available
2. Apply name-based heuristics for Vietnamese entity names
3. Default to 'individual' for unclassified shareholders
Parameters
----------
ownership : pd.DataFrame
Raw ownership data from DataCore.vn
Returns
-------
pd.DataFrame
Ownership data with standardized `owner_type` column
"""
df = ownership.copy()
# --- Method 1: Use explicit flags if available ---
if all(col in df.columns for col in ['is_state', 'is_foreign', 'is_institution']):
conditions = [
(df['is_state'] == True),
(df['is_foreign'] == True) & (df['is_institution'] == True),
(df['is_foreign'] == True) & (df['is_institution'] != True),
(df['is_institution'] == True) & (df['is_state'] != True) &
(df['is_foreign'] != True),
]
choices = [
OwnershipType.STATE,
OwnershipType.FOREIGN_INST,
OwnershipType.FOREIGN_INST, # Foreign individuals often grouped
OwnershipType.DOMESTIC_INST,
]
df['owner_type'] = np.select(conditions, choices,
default=OwnershipType.INDIVIDUAL)
# --- Method 2: Name-based heuristics ---
elif 'shareholder_name' in df.columns:
name = df['shareholder_name'].str.lower().fillna('')
# State entities: government ministries, SCIC, state corporations
state_keywords = [
'bộ tài chính', 'tổng công ty đầu tư', 'scic',
'ủy ban nhân dân', 'nhà nước', 'state capital',
'tổng công ty', 'vốn nhà nước', 'bộ công thương',
'bộ quốc phòng', 'bộ giao thông', 'vinashin',
]
is_state = name.apply(
lambda x: any(kw in x for kw in state_keywords)
)
# Foreign entities: common fund names, foreign company patterns
foreign_keywords = [
'fund', 'investment', 'capital', 'limited', 'ltd', 'inc',
'corporation', 'holdings', 'asset management', 'pte',
'gmbh', 'management', 'partners', 'advisors',
'dragon capital', 'vinacapital', 'templeton',
'blackrock', 'jpmorgan', 'samsung', 'mirae',
]
# Also check for non-Vietnamese characters as a heuristic
is_foreign_name = name.apply(
lambda x: any(kw in x for kw in foreign_keywords)
)
# Domestic institutions: Vietnamese bank, securities, insurance names
domestic_inst_keywords = [
'ngân hàng', 'chứng khoán', 'bảo hiểm', 'quỹ đầu tư',
'công ty quản lý', 'bảo việt', 'techcombank', 'vietcombank',
'bidv', 'vietinbank', 'vpbank', 'mb bank', 'ssi', 'hsc',
'vcsc', 'vndirect', 'fpt capital', 'manulife',
]
is_domestic_inst = name.apply(
lambda x: any(kw in x for kw in domestic_inst_keywords)
)
# Treasury shares
is_treasury = name.str.contains('cổ phiếu quỹ|treasury', case=False)
# Apply classification cascade
df['owner_type'] = OwnershipType.INDIVIDUAL # Default
df.loc[is_domestic_inst, 'owner_type'] = OwnershipType.DOMESTIC_INST
df.loc[is_foreign_name, 'owner_type'] = OwnershipType.FOREIGN_INST
df.loc[is_state, 'owner_type'] = OwnershipType.STATE
df.loc[is_treasury, 'owner_type'] = OwnershipType.TREASURY
# --- Method 3: Use shareholder_type directly ---
elif 'shareholder_type' in df.columns:
type_map = {
'state': OwnershipType.STATE,
'foreign_institution': OwnershipType.FOREIGN_INST,
'foreign_individual': OwnershipType.FOREIGN_INST,
'domestic_institution': OwnershipType.DOMESTIC_INST,
'individual': OwnershipType.INDIVIDUAL,
'treasury': OwnershipType.TREASURY,
}
df['owner_type'] = df['shareholder_type'].str.lower().map(type_map)
df['owner_type'] = df['owner_type'].fillna(OwnershipType.INDIVIDUAL)
else:
raise ValueError(
"Cannot classify shareholders. Expected one of:\n"
" 1. Columns: is_state, is_foreign, is_institution\n"
" 2. Column: shareholder_name (for heuristic classification)\n"
" 3. Column: shareholder_type (pre-classified)"
)
# Summary
print("Ownership classification results:")
print(df['owner_type'].value_counts().to_string())
return df
# ownership_classified = classify_shareholders(dc.ownership)33.4 Vietnam’s Ownership Taxonomy
33.4.1 The Five Ownership Categories
Vietnam’s ownership structure is decomposed into five mutually exclusive categories that together sum to 100% of shares outstanding:
| Category | Vietnamese Term | Description | Typical Share (2020s) |
|---|---|---|---|
| State | Sở hữu Nhà nước | Government entities, SCIC, SOE parent companies | ~15-25% of market cap |
| Foreign Institutional | Tổ chức nước ngoài | Foreign funds, banks, corporations | ~15-20% |
| Domestic Institutional | Tổ chức trong nước | Vietnamese funds, banks, insurance, securities firms | ~5-10% |
| Individual | Cá nhân | Retail investors (both Vietnamese and foreign individuals) | ~55-65% |
| Treasury | Cổ phiếu quỹ | Company’s own repurchased shares | ~0-2% |
This taxonomy differs fundamentally from the US 13F framework in several ways:
- Completeness: We observe 100% of ownership, not just institutional long positions above $100 million AUM.
- State as a category: State ownership is a first-class analytical category, not subsumed under “All Others” as in the LSEG type code system.
- Individual visibility: We observe aggregate individual ownership directly, whereas in the US, individual ownership is merely the residual (100% − institutional ownership).
- No short position ambiguity: Vietnam’s market has very limited short-selling infrastructure, so ownership data genuinely represents long positions.
# ============================================================================
# Step 4: Compute Ownership Decomposition
# ============================================================================
def compute_ownership_decomposition(ownership: pd.DataFrame,
prices_q: pd.DataFrame) -> pd.DataFrame:
"""
Compute the full ownership decomposition for each stock at each
disclosure date.
For each stock-date combination, aggregates shares held by each
ownership category and computes ownership ratios relative to
total shares outstanding.
Parameters
----------
ownership : pd.DataFrame
Classified ownership data (output of classify_shareholders)
prices_q : pd.DataFrame
Quarter-end price data with shares_outstanding
Returns
-------
pd.DataFrame
Stock-period level ownership decomposition with columns for
each ownership type's share count and percentage
"""
# Aggregate shares by ticker, date, and owner type
agg = (ownership.groupby(['ticker', 'date', 'owner_type'])['shares_held']
.sum()
.reset_index())
# Pivot to wide format: one column per ownership type
wide = agg.pivot_table(
index=['ticker', 'date'],
columns='owner_type',
values='shares_held',
fill_value=0
).reset_index()
# Rename columns
type_cols = [c for c in wide.columns if c in OwnershipType.ALL_TYPES]
rename_map = {t: f'shares_{t}' for t in type_cols}
wide = wide.rename(columns=rename_map)
# Total institutional shares
inst_cols = [f'shares_{t}' for t in OwnershipType.INSTITUTIONAL
if f'shares_{t}' in wide.columns]
wide['shares_institutional'] = wide[inst_cols].sum(axis=1)
# Total foreign shares (for FOL tracking)
foreign_cols = [f'shares_{t}' for t in OwnershipType.FOREIGN
if f'shares_{t}' in wide.columns]
wide['shares_foreign_total'] = wide[foreign_cols].sum(axis=1)
# Align with quarter-end dates for merging with price data
wide['quarter_end'] = wide['date'] + pd.offsets.QuarterEnd(0)
# Merge with price data to get shares outstanding
merged = wide.merge(
prices_q[['ticker', 'quarter_end', 'shares_outstanding',
'adjusted_shares', 'market_cap', 'exchange',
'industry_code', 'fol_limit', 'close']],
on=['ticker', 'quarter_end'],
how='left'
)
# Compute ownership ratios
tso = merged['shares_outstanding']
for col in merged.columns:
if col.startswith('shares_') and col != 'shares_outstanding':
ratio_col = col.replace('shares_', 'pct_')
merged[ratio_col] = merged[col] / tso
merged.loc[tso <= 0, ratio_col] = np.nan
# Derived measures
merged['pct_free_float'] = 1 - merged.get('pct_state', 0) - merged.get('pct_treasury', 0)
# SOE flag: state ownership > 50%
merged['is_soe'] = (merged.get('pct_state', 0) > 0.50).astype(int)
# FOL utilization
if 'fol_limit' in merged.columns and 'pct_foreign_total' in merged.columns:
merged['fol_utilization'] = merged['pct_foreign_total'] / merged['fol_limit']
merged['foreign_room'] = merged['fol_limit'] - merged['pct_foreign_total']
merged.loc[merged['fol_limit'] <= 0, ['fol_utilization', 'foreign_room']] = np.nan
# Number of institutional owners (breadth)
n_owners = (ownership[ownership['owner_type'].isin(OwnershipType.INSTITUTIONAL)]
.groupby(['ticker', 'date'])['shareholder_name']
.nunique()
.reset_index()
.rename(columns={'shareholder_name': 'n_inst_owners'}))
n_foreign_owners = (ownership[ownership['owner_type'] == OwnershipType.FOREIGN_INST]
.groupby(['ticker', 'date'])['shareholder_name']
.nunique()
.reset_index()
.rename(columns={'shareholder_name': 'n_foreign_owners'}))
merged = merged.merge(n_owners, on=['ticker', 'date'], how='left')
merged = merged.merge(n_foreign_owners, on=['ticker', 'date'], how='left')
merged[['n_inst_owners', 'n_foreign_owners']] = (
merged[['n_inst_owners', 'n_foreign_owners']].fillna(0)
)
print(f"Ownership decomposition computed:")
print(f" Stock-period observations: {len(merged):,}")
print(f" Unique tickers: {merged['ticker'].nunique():,}")
print(f"\nMean ownership structure:")
pct_cols = [c for c in merged.columns if c.startswith('pct_')]
print(merged[pct_cols].mean().round(4).to_string())
return merged
# ownership_decomp = compute_ownership_decomposition(
# ownership_classified, prices_q
# )33.5 Institutional Ownership Measures
33.5.1 Ownership Ratio
The Institutional Ownership Ratio (IOR) for stock \(i\) at time \(t\) in Vietnam is:
\[ IOR_{i,t} = \frac{S_{i,t}^{state} + S_{i,t}^{foreign\_inst} + S_{i,t}^{domestic\_inst}}{TSO_{i,t}} \tag{33.1}\]
where \(S_{i,t}^{type}\) denotes adjusted shares held by each ownership category and \(TSO_{i,t}\) is total shares outstanding. Unlike the US where the IOR can exceed 100% due to long-only reporting and short selling, the Vietnamese IOR is bounded by construction in \([0, 1]\) because we observe the complete ownership decomposition.
We also compute category-specific ownership ratios:
\[ \begin{aligned} IOR_{i,t}^{foreign} &= \frac{S_{i,t}^{foreign\_inst}}{TSO_{i,t}},\\ IOR_{i,t}^{state} &= \frac{S_{i,t}^{state}}{TSO_{i,t}},\\ IOR_{i,t}^{domestic} &= \frac{S_{i,t}^{domestic\_inst}}{TSO_{i,t}} \end{aligned} \tag{33.2}\]
33.5.2 Concentration: Herfindahl-Hirschman Index
The Institutional Ownership Concentration via the Herfindahl-Hirschman Index is:
\[ IOC_{i,t}^{HHI} = \sum_{j=1}^{N_{i,t}} \left(\frac{S_{i,j,t}}{\sum_{k=1}^{N_{i,t}} S_{i,k,t}}\right)^2 \tag{33.3}\]
In Vietnam, the HHI is particularly informative because it captures the dominance of state shareholders. A company where the government holds 65% will have a mechanically high HHI even if the remaining 35% is diversely held.
We therefore compute separate HHI measures for different ownership categories:
\[ HHI_{i,t}^{total} = \sum_{j} w_{i,j,t}^2, \quad HHI_{i,t}^{non-state} = \sum_{j \notin state} \left(\frac{S_{i,j,t}}{\sum_{k \notin state} S_{i,k,t}}\right)^2 \tag{33.4}\]
The non-state HHI is more comparable to the US institutional HHI, as it captures concentration among market-driven investors.
33.5.3 Breadth of Ownership
Following Chen, Hong, and Stein (2002), Institutional Breadth (\(N_{i,t}\)) is the number of institutional investors holding stock \(i\) in period \(t\). The Change in Breadth is:
\[ \Delta Breadth_{i,t} = \frac{N_{i,t}^{cont} - N_{i,t-1}^{cont}}{TotalInstitutions_{t-1}} \tag{33.5}\]
where \(N_{i,t}^{cont}\) counts only institutions that appear in the disclosure universe in both periods \(t\) and \(t-1\), following the Lehavy and Sloan (2008) algorithm. This adjustment is particularly important in Vietnam where:
- New funds launch frequently (especially ETFs tracking VN30)
- Foreign funds enter and exit the market
- Domestic securities firms consolidate or spin off asset management divisions
# ============================================================================
# Step 5: Compute All IO Metrics
# ============================================================================
def compute_io_metrics_vietnam(ownership: pd.DataFrame,
ownership_decomp: pd.DataFrame,
adj_factors: pd.DataFrame) -> pd.DataFrame:
"""
Compute security-level institutional ownership metrics adapted for Vietnam.
Computes:
1. Ownership ratios by category (state, foreign, domestic inst, individual)
2. HHI concentration (total, non-state, foreign-only)
3. Number of institutional owners (total, foreign, domestic)
4. Change in breadth (Lehavy-Sloan adjusted)
5. FOL-related metrics (utilization, room, near-cap indicator)
Parameters
----------
ownership : pd.DataFrame
Classified ownership data with individual shareholder records
ownership_decomp : pd.DataFrame
Aggregated ownership decomposition (output of compute_ownership_decomposition)
adj_factors : pd.DataFrame
Corporate action adjustment factors
Returns
-------
pd.DataFrame
Stock-period level metrics
"""
# Start with the ownership decomposition
metrics = ownership_decomp.copy()
# --- HHI Concentration ---
# Total HHI: across all institutional shareholders
inst_ownership = ownership[
ownership['owner_type'].isin(OwnershipType.INSTITUTIONAL)
].copy()
def compute_hhi_group(group):
"""Compute HHI for a group of shareholders."""
total = group['shares_held'].sum()
if total <= 0:
return np.nan
weights = group['shares_held'] / total
return (weights ** 2).sum()
# Total institutional HHI
hhi_total = (inst_ownership.groupby(['ticker', 'date'])
.apply(compute_hhi_group)
.reset_index(name='hhi_institutional'))
metrics = metrics.merge(hhi_total, on=['ticker', 'date'], how='left')
# Non-state HHI (exclude state shareholders)
non_state = ownership[
ownership['owner_type'].isin([OwnershipType.FOREIGN_INST,
OwnershipType.DOMESTIC_INST])
]
hhi_nonstate = (non_state.groupby(['ticker', 'date'])
.apply(compute_hhi_group)
.reset_index(name='hhi_non_state'))
metrics = metrics.merge(hhi_nonstate, on=['ticker', 'date'], how='left')
# Foreign-only HHI
foreign_only = ownership[ownership['owner_type'] == OwnershipType.FOREIGN_INST]
hhi_foreign = (foreign_only.groupby(['ticker', 'date'])
.apply(compute_hhi_group)
.reset_index(name='hhi_foreign'))
metrics = metrics.merge(hhi_foreign, on=['ticker', 'date'], how='left')
# --- Change in Breadth (Lehavy-Sloan Algorithm) ---
metrics = metrics.sort_values(['ticker', 'date'])
# Get list of all institutions filing in each period
inst_by_period = (inst_ownership.groupby('date')['shareholder_name']
.apply(set)
.to_dict())
# For each stock-period: count continuing institutions
def compute_breadth_change(group):
group = group.sort_values('date').reset_index(drop=True)
group['dbreadth'] = np.nan
for i in range(1, len(group)):
current_date = group.loc[i, 'date']
prev_date = group.loc[i-1, 'date']
# Institutions in universe for both periods
current_universe = inst_by_period.get(current_date, set())
prev_universe = inst_by_period.get(prev_date, set())
continuing_universe = current_universe & prev_universe
if len(prev_universe) == 0:
continue
# Count continuing institutions holding this stock in each period
ticker = group.loc[i, 'ticker']
current_holders = set(
inst_ownership[
(inst_ownership['ticker'] == ticker) &
(inst_ownership['date'] == current_date)
]['shareholder_name']
)
prev_holders = set(
inst_ownership[
(inst_ownership['ticker'] == ticker) &
(inst_ownership['date'] == prev_date)
]['shareholder_name']
)
# Count only continuing institutions
n_current_cont = len(current_holders & continuing_universe)
n_prev_cont = len(prev_holders & continuing_universe)
group.loc[i, 'dbreadth'] = (
(n_current_cont - n_prev_cont) / len(prev_universe)
)
return group
metrics = metrics.groupby('ticker', group_keys=False).apply(compute_breadth_change)
# --- FOL Indicators ---
if 'fol_utilization' in metrics.columns:
metrics['near_fol_cap'] = (metrics['fol_utilization'] > 0.90).astype(int)
metrics['at_fol_cap'] = (metrics['fol_utilization'] > 0.98).astype(int)
print(f"IO metrics computed for Vietnam:")
print(f" Observations: {len(metrics):,}")
print(f"\nKey metric distributions:")
summary_cols = ['pct_institutional', 'pct_state', 'pct_foreign_total',
'hhi_institutional', 'n_inst_owners', 'dbreadth']
summary_cols = [c for c in summary_cols if c in metrics.columns]
print(metrics[summary_cols].describe().round(4).to_string())
return metrics
# io_metrics = compute_io_metrics_vietnam(
# ownership_classified, ownership_decomp, adj_factors
# )33.5.4 Time Series Visualization
def plot_ownership_timeseries_vietnam(metrics: pd.DataFrame):
"""
Create publication-quality time series plots of Vietnamese
ownership structure evolution.
"""
fig, axes = plt.subplots(3, 1, figsize=(12, 14))
# Aggregate across all stocks (market-cap weighted)
ts = metrics.groupby('quarter_end').apply(
lambda g: pd.Series({
'pct_state': np.average(g['pct_state'].fillna(0),
weights=g['market_cap'].fillna(1)),
'pct_foreign': np.average(g['pct_foreign_total'].fillna(0),
weights=g['market_cap'].fillna(1)),
'pct_domestic_inst': np.average(g['pct_domestic_inst'].fillna(0),
weights=g['market_cap'].fillna(1)),
'pct_individual': np.average(g['pct_individual'].fillna(0),
weights=g['market_cap'].fillna(1)),
'n_stocks': g['ticker'].nunique(),
'total_mktcap': g['market_cap'].sum(),
'median_n_inst': g['n_inst_owners'].median(),
'median_hhi': g['hhi_institutional'].median(),
'pct_soe': g['is_soe'].mean(),
})
).reset_index()
# ---- Panel A: Ownership Composition (Stacked Area) ----
ax = axes[0]
dates = ts['quarter_end']
ax.stackplot(dates,
ts['pct_state'] * 100,
ts['pct_foreign'] * 100,
ts['pct_domestic_inst'] * 100,
ts['pct_individual'] * 100,
labels=['State', 'Foreign Institutional',
'Domestic Institutional', 'Individual'],
colors=[OWNER_COLORS['State'], OWNER_COLORS['Foreign Institutional'],
OWNER_COLORS['Domestic Institutional'], OWNER_COLORS['Individual']],
alpha=0.8)
ax.set_ylabel('Ownership Share (%)')
ax.set_title('Panel A: Ownership Composition of Vietnamese Listed Companies '
'(Market-Cap Weighted)')
ax.legend(loc='upper right', frameon=True, framealpha=0.9)
ax.set_ylim(0, 100)
# ---- Panel B: Institutional Ownership by Component ----
ax = axes[1]
ax.plot(dates, ts['pct_state'] * 100, label='State',
color=OWNER_COLORS['State'], linewidth=2)
ax.plot(dates, ts['pct_foreign'] * 100, label='Foreign Institutional',
color=OWNER_COLORS['Foreign Institutional'], linewidth=2)
ax.plot(dates, ts['pct_domestic_inst'] * 100, label='Domestic Institutional',
color=OWNER_COLORS['Domestic Institutional'], linewidth=2)
total_inst = (ts['pct_state'] + ts['pct_foreign'] + ts['pct_domestic_inst']) * 100
ax.plot(dates, total_inst, label='Total Institutional',
color=OWNER_COLORS['Total Institutional'], linewidth=2.5, linestyle='--')
ax.set_ylabel('Ownership Ratio (%)')
ax.set_title('Panel B: Institutional Ownership Components')
ax.legend(loc='upper left', frameon=True, framealpha=0.9)
# ---- Panel C: Market Structure ----
ax = axes[2]
ax2 = ax.twinx()
ax.plot(dates, ts['n_stocks'], color='#1f77b4', linewidth=2, label='# Listed Stocks')
ax2.plot(dates, ts['total_mktcap'] / 1000, color='#d62728', linewidth=2,
label='Total Market Cap (Trillion VND)')
ax.set_ylabel('Number of Listed Stocks', color='#1f77b4')
ax2.set_ylabel('Market Cap (Trillion VND)', color='#d62728')
ax.set_title('Panel C: Vietnamese Stock Market Development')
# Combine legends
lines1, labels1 = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines1 + lines2, labels1 + labels2, loc='upper left', framealpha=0.9)
plt.tight_layout()
plt.savefig('fig_ownership_timeseries_vn.png', dpi=300, bbox_inches='tight')
plt.show()
# plot_ownership_timeseries_vietnam(io_metrics)def plot_io_by_exchange_size(metrics: pd.DataFrame):
"""Plot IO ratios by exchange and size quintile."""
df = metrics[metrics['market_cap'].notna() & (metrics['market_cap'] > 0)].copy()
# Size quintiles within each quarter
df['size_quintile'] = df.groupby('quarter_end')['market_cap'].transform(
lambda x: pd.qcut(x, 5, labels=['Q1\n(Small)', 'Q2', 'Q3', 'Q4', 'Q5\n(Large)'],
duplicates='drop')
)
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)
metrics_to_plot = [
('pct_institutional', 'Total Institutional'),
('pct_foreign_total', 'Foreign Institutional'),
('pct_state', 'State'),
]
for ax, (col, title) in zip(axes, metrics_to_plot):
for exchange, color in EXCHANGE_COLORS.items():
data = df[df['exchange'] == exchange]
if len(data) == 0:
continue
means = data.groupby('size_quintile')[col].mean() * 100
ax.bar(np.arange(len(means)) + list(EXCHANGE_COLORS.keys()).index(exchange) * 0.25,
means, width=0.25, label=exchange, color=color, alpha=0.8)
ax.set_title(title)
ax.set_xlabel('Size Quintile')
if ax == axes[0]:
ax.set_ylabel('Mean Ownership (%)')
ax.legend()
ax.set_xticks(np.arange(5) + 0.25)
ax.set_xticklabels(['Q1\n(Small)', 'Q2', 'Q3', 'Q4', 'Q5\n(Large)'])
plt.tight_layout()
plt.savefig('fig_io_by_exchange_size.png', dpi=300, bbox_inches='tight')
plt.show()
# plot_io_by_exchange_size(io_metrics)def tabulate_io_summary(metrics: pd.DataFrame, start_year: int = 2010) -> pd.DataFrame:
"""
Create publication-quality summary table of Vietnamese ownership
structure by firm size.
"""
df = metrics[
(metrics['quarter_end'].dt.year >= start_year) &
(metrics['market_cap'].notna()) & (metrics['market_cap'] > 0)
].copy()
df['size_quintile'] = df.groupby('quarter_end')['market_cap'].transform(
lambda x: pd.qcut(x, 5, labels=['Q1 (Small)', 'Q2', 'Q3', 'Q4', 'Q5 (Large)'],
duplicates='drop')
)
table = df.groupby('size_quintile').agg(
N=('ticker', 'count'),
Mean_MktCap=('market_cap', 'mean'),
Mean_IO_Total=('pct_institutional', 'mean'),
Mean_State=('pct_state', 'mean'),
Mean_Foreign=('pct_foreign_total', 'mean'),
Mean_Domestic_Inst=('pct_domestic_inst', 'mean'),
Mean_Individual=('pct_individual', 'mean'),
Median_N_Owners=('n_inst_owners', 'median'),
Median_HHI=('hhi_institutional', 'median'),
Pct_SOE=('is_soe', 'mean'),
Mean_FOL_Util=('fol_utilization', 'mean'),
).round(4)
# Format
table['N'] = table['N'].apply(lambda x: f"{x:,.0f}")
table['Mean_MktCap'] = table['Mean_MktCap'].apply(lambda x: f"{x:,.0f}B VND")
for col in ['Mean_IO_Total', 'Mean_State', 'Mean_Foreign',
'Mean_Domestic_Inst', 'Mean_Individual', 'Pct_SOE', 'Mean_FOL_Util']:
table[col] = table[col].apply(lambda x: f"{x:.1%}" if pd.notna(x) else "—")
table['Median_N_Owners'] = table['Median_N_Owners'].apply(lambda x: f"{x:.0f}")
table['Median_HHI'] = table['Median_HHI'].apply(lambda x: f"{x:.3f}" if pd.notna(x) else "—")
table.columns = ['N', 'Mean Mkt Cap', 'IO Total', 'State', 'Foreign',
'Dom. Inst.', 'Individual', 'Med. # Owners',
'Med. HHI', '% SOE', 'FOL Util.']
return table
# io_summary = tabulate_io_summary(io_metrics)
# print(io_summary.to_string())33.6 Foreign Ownership Dynamics
33.6.1 Foreign Ownership Limits and the FOL Premium
Vietnam’s Foreign Ownership Limits create a unique market segmentation. When a stock reaches its FOL, the only way for a new foreign investor to buy is if an existing foreign holder sells. This creates a de facto “foreign-only” market for FOL-constrained stocks, with documented price premiums (Vo 2015).
The FOL Utilization Ratio for stock \(i\) at time \(t\) is:
\[ FOL\_Util_{i,t} = \frac{ForeignOwnership_{i,t}}{FOL\_Limit_i} \tag{33.6}\]
Stocks are classified by FOL proximity (Table 33.4).
| FOL Zone | Utilization Range | Market Implication |
|---|---|---|
| Green | < 50% | Ample foreign room; normal trading |
| Yellow | 50-80% | Moderate room; some foreign interest pressure |
| Orange | 80-95% | Limited room; foreign premium emerging |
| Red | 95-100% | Near cap; significant foreign premium |
| Capped | ≈ 100% | At limit; foreign-only secondary market |
# ============================================================================
# Step 6: Foreign Ownership Limit Analysis
# ============================================================================
class FOLAnalyzer:
"""
Analyze Foreign Ownership Limit dynamics in the Vietnamese market.
Key analyses:
1. FOL utilization tracking and classification
2. FOL premium estimation (price impact of being near cap)
3. Foreign room dynamics (opening/closing events)
4. Cross-sectional determinants of foreign ownership
"""
FOL_ZONES = {
'Green': (0, 0.50),
'Yellow': (0.50, 0.80),
'Orange': (0.80, 0.95),
'Red': (0.95, 1.00),
'Capped': (1.00, 1.50),
}
def __init__(self, io_metrics: pd.DataFrame,
foreign_daily: Optional[pd.DataFrame] = None):
"""
Parameters
----------
io_metrics : pd.DataFrame
Full ownership metrics from compute_io_metrics_vietnam()
foreign_daily : pd.DataFrame, optional
Daily foreign ownership tracking from DataCore.vn
"""
self.metrics = io_metrics.copy()
self.foreign_daily = foreign_daily
def classify_fol_zones(self) -> pd.DataFrame:
"""Classify stocks into FOL proximity zones."""
df = self.metrics.copy()
if 'fol_utilization' not in df.columns:
print("FOL utilization not available in metrics.")
return df
conditions = []
choices = []
for zone, (lo, hi) in self.FOL_ZONES.items():
conditions.append(
(df['fol_utilization'] >= lo) & (df['fol_utilization'] < hi)
)
choices.append(zone)
df['fol_zone'] = np.select(conditions, choices, default='Unknown')
# Summary
zone_dist = df.groupby('fol_zone')['ticker'].nunique()
print("FOL Zone Distribution (unique stocks):")
print(zone_dist.to_string())
return df
def estimate_fol_premium(self) -> pd.DataFrame:
"""
Estimate the FOL premium using a cross-sectional approach.
For each period, regress stock valuations (P/B or P/E) on FOL
utilization, controlling for fundamentals. The coefficient on
FOL utilization captures the premium investors pay for stocks
near their foreign ownership cap.
Alternative: Compare returns of stocks transitioning between
FOL zones as a natural experiment.
"""
df = self.metrics.copy()
df = df[df['fol_utilization'].notna() & df['market_cap'].notna()].copy()
# FOL zone dummies
df['near_cap'] = (df['fol_utilization'] > 0.90).astype(int)
df['at_cap'] = (df['fol_utilization'] > 0.98).astype(int)
# Price-to-book as valuation measure
# (Assumes 'equity' is available from financial data)
if 'equity' in df.columns:
df['pb_ratio'] = df['market_cap'] * 1e9 / df['equity']
else:
# Use market cap as proxy for cross-sectional analysis
df['log_mktcap'] = np.log(df['market_cap'])
# Fama-MacBeth style: run cross-sectional regressions each period
results = []
for quarter, group in df.groupby('quarter_end'):
group = group.dropna(subset=['fol_utilization', 'log_mktcap'])
if len(group) < 50:
continue
y = group['log_mktcap']
X = sm.add_constant(group[['fol_utilization', 'pct_state',
'n_inst_owners']])
try:
model = sm.OLS(y, X).fit()
results.append({
'quarter': quarter,
'beta_fol': model.params.get('fol_utilization', np.nan),
'tstat_fol': model.tvalues.get('fol_utilization', np.nan),
'r2': model.rsquared,
'n': len(group),
})
except Exception:
continue
if results:
results_df = pd.DataFrame(results)
print("FOL Premium (Fama-MacBeth Regression):")
print(f" Mean β(FOL_util): {results_df['beta_fol'].mean():.4f}")
print(f" t-statistic: {results_df['beta_fol'].mean() / "
f"(results_df['beta_fol'].std() / np.sqrt(len(results_df))):.2f}")
return results_df
return pd.DataFrame()
def analyze_foreign_room_events(self) -> pd.DataFrame:
"""
Analyze events where foreign room opens or closes.
Room-opening events (FOL cap raised, foreign seller exits) can
trigger significant price movements as pent-up foreign demand
is released. Room-closing events (approaching cap) can create
selling pressure as foreign investors anticipate illiquidity.
"""
if self.foreign_daily is None:
print("Daily foreign ownership data required for event analysis.")
return pd.DataFrame()
df = self.foreign_daily.copy()
df = df.sort_values(['ticker', 'date'])
# Compute daily change in foreign room
df['foreign_room_change'] = df.groupby('ticker')['foreign_room'].diff()
# Identify room-opening events (room increases by > 1 percentage point)
df['room_open_event'] = (df['foreign_room_change'] > 0.01).astype(int)
# Identify room-closing events (room decreases to < 2%)
df['room_close_event'] = (
(df['foreign_room'] < 0.02) &
(df.groupby('ticker')['foreign_room'].shift(1) >= 0.02)
).astype(int)
events = df[
(df['room_open_event'] == 1) | (df['room_close_event'] == 1)
].copy()
print(f"Foreign room events identified:")
print(f" Room-opening events: {df['room_open_event'].sum():,}")
print(f" Room-closing events: {df['room_close_event'].sum():,}")
return events
# fol_analyzer = FOLAnalyzer(io_metrics, dc.foreign_ownership)
# fol_classified = fol_analyzer.classify_fol_zones()
# fol_premium = fol_analyzer.estimate_fol_premium()def plot_fol_utilization(metrics: pd.DataFrame):
"""Plot FOL utilization distribution by sector."""
df = metrics[metrics['fol_utilization'].notna()].copy()
# Assign broad sectors
sector_map = {
'Banking': ['VCB', 'BID', 'CTG', 'TCB', 'VPB', 'MBB', 'ACB', 'HDB', 'STB', 'TPB'],
'Real Estate': ['VHM', 'VIC', 'NVL', 'KDH', 'DXG', 'HDG', 'VRE'],
'Technology': ['FPT', 'CMG', 'FOX'],
'Consumer': ['VNM', 'MSN', 'SAB', 'MWG', 'PNJ'],
}
fig, ax = plt.subplots(figsize=(10, 6))
for sector, tickers in sector_map.items():
data = df[df['ticker'].isin(tickers)]['fol_utilization']
if len(data) > 0:
ax.hist(data * 100, bins=30, alpha=0.4, label=sector, density=True)
ax.axvline(x=30, color='red', linestyle='--', alpha=0.7, label='Banking FOL (30%)')
ax.axvline(x=49, color='blue', linestyle='--', alpha=0.7, label='Standard FOL (49%)')
ax.set_xlabel('FOL Utilization (%)')
ax.set_ylabel('Density')
ax.set_title('Foreign Ownership Limit Utilization Distribution')
ax.legend()
plt.tight_layout()
plt.savefig('fig_fol_utilization.png', dpi=300, bbox_inches='tight')
plt.show()
# plot_fol_utilization(io_metrics)33.7 Institutional Trades
33.7.1 Trade Inference in Vietnam
In the US, institutional trades are inferred from quarterly 13F holding snapshots. In Vietnam, the challenge is more acute because disclosure frequency varies:
- Major shareholders (\(\ge\) 5%): Must disclose within 7 business days of crossing ownership thresholds (5%, 10%, 15%, 20%, 25%, 50%, 65%, 75%)
- Fund portfolio reports: Semi-annual disclosure required; some funds report quarterly
- Annual reports: Provide complete shareholder register but only once per year
- Daily foreign ownership: HOSE/HNX publish aggregate daily foreign buy/sell data
We derive trades from the change in ownership between consecutive disclosure dates, applying the same logic as the US Ben-David et al. (2013) algorithm but adapted for Vietnam’s irregular disclosure intervals.
# ============================================================================
# Step 7: Derive Institutional Trades
# ============================================================================
def derive_trades_vietnam(ownership: pd.DataFrame,
adj_factors: pd.DataFrame) -> pd.DataFrame:
"""
Derive institutional trades from changes in ownership disclosures.
Adapted from Ben-David, Franzoni, and Moussawi (2012) for
Vietnam's irregular disclosure frequency.
Key differences from US approach:
1. Disclosure intervals are irregular (not always quarterly)
2. We observe ALL institutional types, not just 13F filers
3. No $100M AUM threshold (we see all institutional holders)
4. Must adjust for corporate actions between disclosure dates
Trade types:
+1: Initiating Buy (new position)
+2: Incremental Buy (increased existing position)
-1: Terminating Sale (fully exited position)
-2: Incremental Sale (reduced existing position)
Parameters
----------
ownership : pd.DataFrame
Classified ownership with: ticker, date, shareholder_name,
shares_held, owner_type
adj_factors : pd.DataFrame
Corporate action adjustment factors
Returns
-------
pd.DataFrame
Trade-level data: date, shareholder_name, ticker, trade,
buysale, owner_type
"""
# Focus on institutional shareholders only
inst = ownership[
ownership['owner_type'].isin(OwnershipType.INSTITUTIONAL)
].copy()
inst = inst.sort_values(['shareholder_name', 'ticker', 'date']).reset_index(drop=True)
trades_list = []
for (shareholder, ticker), group in inst.groupby(['shareholder_name', 'ticker']):
group = group.reset_index(drop=True)
for i in range(len(group)):
current = group.iloc[i]
current_date = current['date']
current_shares = current['shares_held']
owner_type = current['owner_type']
if i == 0:
# First observation: if institution appears, it's an initiating buy
# (we don't know if they held before our data starts)
# Skip the very first observation to avoid false initiating buys
continue
prev = group.iloc[i - 1]
prev_date = prev['date']
prev_shares = prev['shares_held']
# Adjust previous shares for corporate actions between dates
prev_shares_adj = adjust_shares(
prev_shares, ticker, prev_date, current_date, adj_factors
)
# Compute trade (in adjusted shares)
trade = current_shares - prev_shares_adj
# Classify trade type
if abs(trade) < 1: # De minimis threshold
continue
if prev_shares_adj <= 0 and current_shares > 0:
buysale = 1 # Initiating buy
elif prev_shares_adj > 0 and current_shares <= 0:
buysale = -1 # Terminating sale
elif trade > 0:
buysale = 2 # Incremental buy
else:
buysale = -2 # Incremental sale
trades_list.append({
'date': current_date,
'shareholder_name': shareholder,
'ticker': ticker,
'trade': trade,
'prev_shares_adj': prev_shares_adj,
'current_shares': current_shares,
'buysale': buysale,
'owner_type': owner_type,
'days_between': (current_date - prev_date).days,
})
trades = pd.DataFrame(trades_list)
if len(trades) > 0:
print(f"Trades derived: {len(trades):,}")
print(f"\nTrade type distribution:")
labels = {1: 'Initiating Buy', 2: 'Incremental Buy',
-1: 'Terminating Sale', -2: 'Incremental Sale'}
for bs, label in sorted(labels.items()):
n = (trades['buysale'] == bs).sum()
print(f" {label}: {n:,} ({n/len(trades):.1%})")
print(f"\nBy owner type:")
print(trades.groupby('owner_type')['trade'].agg(['count', 'mean', 'median'])
.round(0).to_string())
return trades
# trades = derive_trades_vietnam(ownership_classified, adj_factors)When computing trades as \(\Delta Shares = Shares_t - Shares_{t-1}\), the previous period’s shares must be adjusted for any corporate actions between \(t-1\) and \(t\). If VNM issued a 20% stock dividend between the two disclosure dates, then 1,000 shares at \(t-1\) should be compared to 1,200 adjusted shares, not 1,000 raw shares. Failing to make this adjustment would create a phantom “buy” of 200 shares that never actually occurred.
def derive_trades_vectorized_vietnam(ownership: pd.DataFrame,
adj_factors: pd.DataFrame) -> pd.DataFrame:
"""
Vectorized version of Vietnamese trade derivation.
Uses pandas groupby and vectorized operations instead of Python loops.
Approximately 20-50x faster for large datasets.
Note: Corporate action adjustment is applied per-group, which still
requires some iteration but is much faster than row-by-row.
"""
inst = ownership[
ownership['owner_type'].isin(OwnershipType.INSTITUTIONAL) &
(ownership['shares_held'] > 0)
].copy()
inst = inst.sort_values(['shareholder_name', 'ticker', 'date']).reset_index(drop=True)
# Lagged values
inst['prev_date'] = inst.groupby(['shareholder_name', 'ticker'])['date'].shift(1)
inst['prev_shares'] = inst.groupby(['shareholder_name', 'ticker'])['shares_held'].shift(1)
inst['is_first'] = inst['prev_date'].isna()
# Remove first observations (no prior to compare)
inst = inst[~inst['is_first']].copy()
# Adjust previous shares for corporate actions
# Vectorized: for each row, apply adjustment between prev_date and date
def adjust_row(row):
return adjust_shares(
row['prev_shares'], row['ticker'],
row['prev_date'], row['date'], adj_factors
)
inst['prev_shares_adj'] = inst.apply(adjust_row, axis=1)
# Compute trade
inst['trade'] = inst['shares_held'] - inst['prev_shares_adj']
inst['days_between'] = (inst['date'] - inst['prev_date']).dt.days
# Classify trade type
inst['buysale'] = np.select(
[
(inst['prev_shares_adj'] <= 0) & (inst['shares_held'] > 0),
(inst['prev_shares_adj'] > 0) & (inst['shares_held'] <= 0),
inst['trade'] > 0,
inst['trade'] < 0,
],
[1, -1, 2, -2],
default=0
)
# Remove zero trades
trades = inst[inst['buysale'] != 0].copy()
trades = trades[['date', 'shareholder_name', 'ticker', 'trade',
'buysale', 'owner_type', 'days_between',
'prev_shares_adj', 'shares_held']].copy()
trades = trades.rename(columns={'shares_held': 'current_shares'})
print(f"Vectorized trades: {len(trades):,}")
return trades
# trades = derive_trades_vectorized_vietnam(ownership_classified, adj_factors)33.8 Fund-Level Flows and Turnover
33.8.1 Portfolio Assets and Returns from Fund Holdings
Using DataCore.vn’s fund holdings data, we compute fund-level portfolio analytics analogous to the US 13F approach:
\[ Assets_{j,t} = \sum_{i=1}^{N_{j,t}} S_{i,j,t} \times P_{i,t} \tag{33.7}\]
\[ R_{j,t \to t+1}^{holdings} = \frac{\sum_{i} S_{i,j,t} \times P_{i,t} \times R_{i,t \to t+1}}{\sum_{i} S_{i,j,t} \times P_{i,t}} \tag{33.8}\]
\[ NetFlows_{j,t} = Assets_{j,t} - Assets_{j,t-1} \times (1 + R_{j,t-1 \to t}^{holdings}) \tag{33.9}\]
33.8.2 Turnover Measures
Following Carhart (1997), adapted for Vietnam’s fund reporting:
\[ Turnover_{j,t}^{Carhart} = \frac{\min(TotalBuys_{j,t}, TotalSales_{j,t})}{\overline{Assets}_{j,t}} \tag{33.10}\]
# ============================================================================
# Step 8: Fund-Level Portfolio Analytics
# ============================================================================
def compute_fund_analytics(fund_holdings: pd.DataFrame,
prices_q: pd.DataFrame,
adj_factors: pd.DataFrame) -> Dict:
"""
Compute fund-level portfolio analytics from DataCore.vn fund holdings.
Vietnamese fund disclosure is typically semi-annual (some quarterly),
which limits the frequency of these analytics compared to the US
quarterly approach.
Returns
-------
dict with keys:
'fund_assets': pd.DataFrame of fund-level assets and returns
'fund_trades': pd.DataFrame of fund-level derived trades
'fund_aggregates': pd.DataFrame of flows and turnover
"""
fh = fund_holdings.copy()
fh = fh[fh['shares_held'] > 0].copy()
# Merge with prices
fh = fh.merge(
prices_q[['ticker', 'quarter_end', 'close', 'adjusted_close', 'ret']],
left_on=['ticker', 'report_date'],
right_on=['ticker', 'quarter_end'],
how='inner'
)
# Portfolio value
fh['holding_value'] = fh['shares_held'] * fh['close']
# --- Fund-Level Assets ---
fund_assets = fh.groupby(['fund_name', 'report_date']).agg(
total_assets=('holding_value', lambda x: x.sum() / 1e9), # Billion VND
n_stocks=('ticker', 'nunique'),
).reset_index()
# Holdings return (value-weighted)
fh['weight'] = fh.groupby(['fund_name', 'report_date'])['holding_value'].transform(
lambda x: x / x.sum()
)
fund_hret = (fh.groupby(['fund_name', 'report_date'])
.apply(lambda g: np.average(g['ret'].fillna(0), weights=g['weight']))
.reset_index(name='holdings_return'))
fund_assets = fund_assets.merge(fund_hret, on=['fund_name', 'report_date'])
# --- Fund-Level Trades ---
# Derive trades from changes in holdings
fh_sorted = fh.sort_values(['fund_name', 'ticker', 'report_date'])
fh_sorted['prev_shares'] = fh_sorted.groupby(['fund_name', 'ticker'])['shares_held'].shift(1)
fh_sorted['prev_date'] = fh_sorted.groupby(['fund_name', 'ticker'])['report_date'].shift(1)
# Adjust for corporate actions
fh_sorted['prev_shares_adj'] = fh_sorted.apply(
lambda r: adjust_shares(r['prev_shares'], r['ticker'],
r['prev_date'], r['report_date'], adj_factors)
if pd.notna(r['prev_shares']) else np.nan,
axis=1
)
fh_sorted['trade'] = fh_sorted['shares_held'] - fh_sorted['prev_shares_adj']
fh_sorted['trade_value'] = fh_sorted['trade'] * fh_sorted['close'] / 1e9 # Billion VND
# Aggregate buys and sells per fund-period
fund_trades = fh_sorted[fh_sorted['trade'].notna()].copy()
fund_flows = fund_trades.groupby(['fund_name', 'report_date']).agg(
total_buys=('trade_value', lambda x: x[x > 0].sum()),
total_sales=('trade_value', lambda x: -x[x < 0].sum()),
).reset_index()
# --- Fund-Level Aggregates ---
fund_agg = fund_assets.merge(fund_flows, on=['fund_name', 'report_date'], how='left')
fund_agg[['total_buys', 'total_sales']] = fund_agg[['total_buys', 'total_sales']].fillna(0)
fund_agg = fund_agg.sort_values(['fund_name', 'report_date'])
fund_agg['lag_assets'] = fund_agg.groupby('fund_name')['total_assets'].shift(1)
fund_agg['lag_hret'] = fund_agg.groupby('fund_name')['holdings_return'].shift(1)
# Net flows
fund_agg['net_flows'] = (fund_agg['total_assets'] -
fund_agg['lag_assets'] * (1 + fund_agg['holdings_return']))
# Turnover (Carhart definition)
fund_agg['avg_assets'] = (fund_agg['total_assets'] + fund_agg['lag_assets']) / 2
fund_agg['turnover'] = (
fund_agg[['total_buys', 'total_sales']].min(axis=1) / fund_agg['avg_assets']
)
# Annualize (approximate, since disclosure may be semi-annual)
fund_agg['periods_per_year'] = 365 / fund_agg.groupby('fund_name')['report_date'].diff().dt.days
fund_agg['turnover_annual'] = fund_agg['turnover'] * fund_agg['periods_per_year'].fillna(2)
print(f"Fund analytics computed:")
print(f" Unique funds: {fund_agg['fund_name'].nunique():,}")
print(f" Fund-period observations: {len(fund_agg):,}")
print(f"\nTurnover statistics:")
print(fund_agg[['turnover', 'turnover_annual']].describe().round(4))
return {
'fund_assets': fund_assets,
'fund_trades': fund_trades,
'fund_aggregates': fund_agg,
}
# fund_analytics = compute_fund_analytics(dc.fund_holdings, prices_q, adj_factors)33.9 State Ownership Analysis
33.9.1 Equitization and the Decline of State Ownership
Vietnam’s equitization (cổ phần hóa) program has been a defining feature of the market since the early 2000s. The program converts state-owned enterprises into joint-stock companies, typically with the state retaining a controlling or significant minority stake that is then gradually reduced through secondary offerings.
# ============================================================================
# Step 9: State Ownership Analysis
# ============================================================================
def analyze_state_ownership(metrics: pd.DataFrame) -> Dict:
"""
Comprehensive analysis of state ownership in Vietnam.
Computes:
1. Aggregate state ownership trends
2. SOE population dynamics (entry/exit from SOE classification)
3. Equitization event detection (large drops in state ownership)
4. State ownership by sector and size
5. Governance implications (state as blockholder)
"""
df = metrics.copy()
# --- 1. Aggregate Trends ---
ts = df.groupby('quarter_end').agg(
n_soe=('is_soe', 'sum'),
n_total=('ticker', 'nunique'),
pct_soe=('is_soe', 'mean'),
mean_state_pct=('pct_state', 'mean'),
median_state_pct=('pct_state', 'median'),
# Market cap share of SOEs
soe_mktcap=('market_cap', lambda x: x[df.loc[x.index, 'is_soe'] == 1].sum()),
total_mktcap=('market_cap', 'sum'),
).reset_index()
ts['soe_mktcap_share'] = ts['soe_mktcap'] / ts['total_mktcap']
# --- 2. Equitization Events ---
# Detect large drops in state ownership (>10 percentage points)
df_sorted = df.sort_values(['ticker', 'quarter_end'])
df_sorted['state_change'] = df_sorted.groupby('ticker')['pct_state'].diff()
equitization_events = df_sorted[
df_sorted['state_change'] < -0.10 # > 10pp drop
][['ticker', 'quarter_end', 'pct_state', 'state_change', 'market_cap']].copy()
# --- 3. By Sector ---
if 'industry_code' in df.columns:
by_sector = df.groupby('industry_code').agg(
mean_state=('pct_state', 'mean'),
pct_soe=('is_soe', 'mean'),
n_firms=('ticker', 'nunique'),
).sort_values('mean_state', ascending=False)
else:
by_sector = None
print(f"State Ownership Analysis:")
print(f" Current SOE count: {ts.iloc[-1]['n_soe']:.0f} / {ts.iloc[-1]['n_total']:.0f}")
print(f" SOE market cap share: {ts.iloc[-1]['soe_mktcap_share']:.1%}")
print(f" Mean state ownership: {ts.iloc[-1]['mean_state_pct']:.1%}")
print(f"\nEquitization events detected: {len(equitization_events):,}")
return {
'trends': ts,
'equitization_events': equitization_events,
'by_sector': by_sector,
}
# state_analysis = analyze_state_ownership(io_metrics)def plot_state_ownership(state_analysis: Dict, metrics: pd.DataFrame):
"""Plot state ownership dynamics."""
fig, axes = plt.subplots(2, 1, figsize=(12, 10))
ts = state_analysis['trends']
# Panel A: SOE trends
ax = axes[0]
ax.plot(ts['quarter_end'], ts['pct_soe'] * 100,
label='% of Firms that are SOEs', linewidth=2, color='#d62728')
ax.plot(ts['quarter_end'], ts['soe_mktcap_share'] * 100,
label='SOE Market Cap Share (%)', linewidth=2, color='#1f77b4')
ax.plot(ts['quarter_end'], ts['mean_state_pct'] * 100,
label='Mean State Ownership (%)', linewidth=2, color='#2ca02c', linestyle='--')
ax.set_ylabel('Percentage')
ax.set_title('Panel A: State Ownership and SOE Prevalence Over Time')
ax.legend(frameon=True, framealpha=0.9)
# Panel B: Distribution
ax = axes[1]
# Use most recent period
latest = metrics[metrics['quarter_end'] == metrics['quarter_end'].max()]
state_pct = latest['pct_state'].dropna() * 100
ax.hist(state_pct, bins=50, color='#d62728', alpha=0.7, edgecolor='black')
ax.axvline(x=50, color='black', linestyle='--', alpha=0.7, label='50% (SOE threshold)')
ax.set_xlabel('State Ownership (%)')
ax.set_ylabel('Number of Companies')
ax.set_title('Panel B: Distribution of State Ownership (Most Recent Quarter)')
ax.legend()
plt.tight_layout()
plt.savefig('fig_state_ownership.png', dpi=300, bbox_inches='tight')
plt.show()
# plot_state_ownership(state_analysis, io_metrics)33.10 Modern Extensions
33.10.1 Network Analysis of Co-Ownership
Institutional co-ownership networks capture how stocks are connected through shared investors. In Vietnam, these networks reveal the influence structure of major domestic conglomerates (e.g., Vingroup, Masan, FPT) and the overlap between foreign fund portfolios.
def construct_stock_coownership_network(ownership: pd.DataFrame,
period: str,
min_overlap: int = 3) -> Dict:
"""
Construct a stock-level co-ownership network.
Two stocks are connected if they share institutional investors.
Edge weight = number of shared institutional investors.
This is particularly informative in Vietnam where:
- Foreign fund portfolios concentrate on the same blue-chips
- Conglomerate cross-holdings create explicit linkages
- State ownership creates implicit connections (SCIC holds multiple stocks)
Parameters
----------
ownership : pd.DataFrame
Classified ownership data
period : str
Analysis date
min_overlap : int
Minimum shared investors to create an edge
Returns
-------
dict with network statistics and adjacency data
"""
import networkx as nx
date = pd.Timestamp(period)
# Get institutional holders for this period
inst = ownership[
(ownership['date'] == date) &
(ownership['owner_type'].isin(OwnershipType.INSTITUTIONAL))
][['ticker', 'shareholder_name', 'owner_type']].copy()
# Create bipartite mapping: institution → set of stocks held
inst_to_stocks = inst.groupby('shareholder_name')['ticker'].apply(set).to_dict()
# Stock → set of institutions
stock_to_inst = inst.groupby('ticker')['shareholder_name'].apply(set).to_dict()
# Build stock-level network
stocks = list(stock_to_inst.keys())
G = nx.Graph()
for i in range(len(stocks)):
for j in range(i + 1, len(stocks)):
shared = stock_to_inst[stocks[i]] & stock_to_inst[stocks[j]]
if len(shared) >= min_overlap:
G.add_edge(stocks[i], stocks[j], weight=len(shared),
shared_investors=list(shared)[:5]) # Store sample
# Add node attributes
for stock in stocks:
if stock in G.nodes:
G.nodes[stock]['n_inst_holders'] = len(stock_to_inst[stock])
# Network statistics
stats = {
'n_nodes': G.number_of_nodes(),
'n_edges': G.number_of_edges(),
'density': nx.density(G) if G.number_of_nodes() > 1 else 0,
'avg_clustering': nx.average_clustering(G, weight='weight') if G.number_of_nodes() > 0 else 0,
'n_components': nx.number_connected_components(G),
}
# Centrality measures
if G.number_of_nodes() > 0:
degree_cent = nx.degree_centrality(G)
stats['most_connected'] = sorted(degree_cent.items(),
key=lambda x: x[1], reverse=True)[:10]
if G.number_of_nodes() > 2:
try:
eigen_cent = nx.eigenvector_centrality_numpy(G, weight='weight')
stats['most_central'] = sorted(eigen_cent.items(),
key=lambda x: x[1], reverse=True)[:10]
except Exception:
stats['most_central'] = []
print(f"Co-Ownership Network ({period}):")
for k, v in stats.items():
if k not in ['most_connected', 'most_central']:
print(f" {k}: {v}")
if 'most_connected' in stats:
print(f"\nMost connected stocks:")
for stock, cent in stats['most_connected'][:5]:
print(f" {stock}: {cent:.3f}")
return {'graph': G, 'stats': stats}
# network = construct_stock_coownership_network(
# ownership_classified, '2024-06-30'
# )33.10.2 ML-Enhanced Investor Classification
Vietnam’s investor classification challenge is distinct from the US. While the US has the Bushee typology based on portfolio turnover and concentration, Vietnam requires classification of both investor type (when not explicitly labeled) and investor behavior (active vs passive, short-term vs long-term).
def classify_investors_vietnam(ownership: pd.DataFrame,
prices_q: pd.DataFrame,
n_clusters: int = 4) -> pd.DataFrame:
"""
ML-based classification of Vietnamese institutional investors.
Features adapted for Vietnam's market:
1. Portfolio concentration (HHI of holdings)
2. Holding duration (average time in positions)
3. Size preference (average market cap of holdings)
4. Sector concentration
5. Foreign/domestic indicator
6. Trading frequency (inverse of average days between disclosures)
Expected clusters for Vietnam:
- Passive State Holders: SOE parents, SCIC - low turnover, concentrated
- Active Foreign Funds: Dragon Capital, VinaCapital - moderate turnover
- Domestic Securities Firms: SSI, VNDirect - high turnover, diversified
- Long-Term Foreign: Pension funds, sovereign wealth - low turnover
"""
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
inst = ownership[
ownership['owner_type'].isin(OwnershipType.INSTITUTIONAL)
].copy()
# Merge with price data
inst = inst.merge(
prices_q[['ticker', 'quarter_end', 'close', 'market_cap']],
left_on=['ticker', 'date'],
right_on=['ticker', 'quarter_end'],
how='left'
)
inst['holding_value'] = inst['shares_held'] * inst['close'].fillna(0)
# Compute features per investor-period
features = inst.groupby(['shareholder_name', 'date']).agg(
n_stocks=('ticker', 'nunique'),
total_value=('holding_value', 'sum'),
hhi_portfolio=('holding_value',
lambda x: ((x/x.sum())**2).sum() if x.sum() > 0 else np.nan),
avg_mktcap=('market_cap', 'mean'),
is_foreign=('owner_type',
lambda x: (x == OwnershipType.FOREIGN_INST).any().astype(int)),
is_state=('owner_type',
lambda x: (x == OwnershipType.STATE).any().astype(int)),
).reset_index()
# Average across all periods per investor
investor_features = features.groupby('shareholder_name').agg(
avg_n_stocks=('n_stocks', 'mean'),
avg_hhi=('hhi_portfolio', 'mean'),
avg_mktcap=('avg_mktcap', 'mean'),
avg_total_value=('total_value', 'mean'),
is_foreign=('is_foreign', 'max'),
is_state=('is_state', 'max'),
n_periods=('date', 'nunique'),
).dropna()
# Feature matrix
feature_cols = ['avg_n_stocks', 'avg_hhi', 'avg_mktcap', 'avg_total_value']
X = investor_features[feature_cols].copy()
# Log-transform
for col in feature_cols:
X[col] = np.log1p(X[col].clip(lower=0))
# Add binary features
X['is_foreign'] = investor_features['is_foreign']
X['is_state'] = investor_features['is_state']
# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# K-means
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=20)
investor_features['cluster'] = kmeans.fit_predict(X_scaled)
# Label clusters
cluster_profiles = investor_features.groupby('cluster').agg({
'avg_n_stocks': 'mean',
'avg_hhi': 'mean',
'avg_total_value': 'mean',
'is_foreign': 'mean',
'is_state': 'mean',
'shareholder_name': 'count',
}).rename(columns={'shareholder_name': 'n_investors'})
print("Investor Clusters:")
print(cluster_profiles.round(3).to_string())
return investor_features
# investor_classes = classify_investors_vietnam(ownership_classified, prices_q)33.10.3 Event Study: Ownership Disclosure Shocks
Vietnam’s threshold-based major shareholder disclosure creates natural events for studying the price impact of ownership changes.
def ownership_event_study(major_shareholders: pd.DataFrame,
prices: pd.DataFrame,
event_window: Tuple[int, int] = (-5, 20),
estimation_window: int = 120) -> pd.DataFrame:
"""
Event study of ownership disclosure announcements.
Vietnam requires major shareholders (≥5%) to disclose within 7
business days of crossing ownership thresholds. These disclosures
can be informationally significant, especially:
1. Foreign fund accumulation (signal of quality)
2. State divestiture (equitization signal)
3. Insider purchases (management confidence signal)
Uses market model for expected returns:
E[R_i,t] = α_i + β_i × R_m,t
Parameters
----------
major_shareholders : pd.DataFrame
Disclosure events from DataCore.vn
prices : pd.DataFrame
Daily stock prices
event_window : tuple
(pre_event_days, post_event_days)
estimation_window : int
Days before event window for market model estimation
"""
events = major_shareholders.copy()
events = events.sort_values(['ticker', 'date'])
# Identify significant ownership changes
events['ownership_change'] = events.groupby(
['ticker', 'shareholder_name']
)['ownership_pct'].diff()
significant_events = events[
events['ownership_change'].abs() > 0.01 # > 1 percentage point
].copy()
significant_events['event_type'] = np.where(
significant_events['ownership_change'] > 0, 'accumulation', 'divestiture'
)
# Merge with daily prices
prices_daily = prices[['ticker', 'date', 'ret']].copy()
prices_daily = prices_daily.sort_values(['ticker', 'date'])
# VN-Index as market return (ticker code depends on data provider)
if 'VNINDEX' in prices_daily['ticker'].values:
market_ret = prices_daily[prices_daily['ticker'] == 'VNINDEX'][['date', 'ret']].copy()
market_ret = market_ret.rename(columns={'ret': 'mkt_ret'})
else:
# Use equal-weighted market return as proxy
market_ret = (prices_daily.groupby('date')['ret']
.mean()
.reset_index()
.rename(columns={'ret': 'mkt_ret'}))
# For each event, compute abnormal returns
results = []
pre, post = event_window
for _, event in significant_events.iterrows():
ticker = event['ticker']
event_date = event['date']
# Get stock returns around the event
stock_ret = prices_daily[prices_daily['ticker'] == ticker].copy()
stock_ret = stock_ret.merge(market_ret, on='date', how='left')
stock_ret = stock_ret.sort_values('date').reset_index(drop=True)
# Find event date index
event_idx = stock_ret[stock_ret['date'] >= event_date].index
if len(event_idx) == 0:
continue
event_idx = event_idx[0]
# Estimation window
est_start = max(0, event_idx - estimation_window + pre)
est_end = event_idx + pre
est_data = stock_ret.iloc[est_start:est_end].dropna(subset=['ret', 'mkt_ret'])
if len(est_data) < 30:
continue
# Market model
X = sm.add_constant(est_data['mkt_ret'])
y = est_data['ret']
try:
model = sm.OLS(y, X).fit()
except Exception:
continue
# Event window abnormal returns
ew_start = event_idx + pre
ew_end = min(event_idx + post + 1, len(stock_ret))
event_data = stock_ret.iloc[ew_start:ew_end].copy()
if len(event_data) == 0:
continue
event_data['expected_ret'] = (model.params['const'] +
model.params['mkt_ret'] * event_data['mkt_ret'])
event_data['abnormal_ret'] = event_data['ret'] - event_data['expected_ret']
event_data['car'] = event_data['abnormal_ret'].cumsum()
event_data['event_day'] = range(pre, pre + len(event_data))
event_data['ticker'] = ticker
event_data['event_date'] = event_date
event_data['event_type'] = event['event_type']
event_data['ownership_change'] = event['ownership_change']
event_data['shareholder_name'] = event['shareholder_name']
results.append(event_data)
if results:
all_results = pd.concat(results, ignore_index=True)
# Average CARs by event type
avg_car = (all_results.groupby(['event_type', 'event_day'])['car']
.agg(['mean', 'std', 'count'])
.reset_index())
avg_car['t_stat'] = avg_car['mean'] / (avg_car['std'] / np.sqrt(avg_car['count']))
print(f"Event Study Results:")
print(f" Total events: {significant_events['event_type'].value_counts().to_string()}")
# CAR at event day 0, +5, +10, +20
for et in ['accumulation', 'divestiture']:
print(f"\n {et.title()} Events:")
subset = avg_car[avg_car['event_type'] == et]
for day in [0, 5, 10, 20]:
row = subset[subset['event_day'] == day]
if len(row) > 0:
print(f" CAR({day:+d}): {row.iloc[0]['mean']:.4f} "
f"(t={row.iloc[0]['t_stat']:.2f})")
return all_results
return pd.DataFrame()
# event_results = ownership_event_study(dc.major_shareholders, dc.prices)33.11 Empirical Applications
33.11.1 Application 1: Foreign Ownership and Stock Returns in Vietnam
Does foreign institutional ownership predict returns in Vietnam? Huang, Liu, and Shu (2023) find evidence consistent with the information advantage hypothesis.
def test_foreign_io_returns(metrics: pd.DataFrame) -> pd.DataFrame:
"""
Test whether changes in foreign institutional ownership predict
future stock returns in Vietnam.
Methodology:
1. Sort stocks into quintiles by change in foreign IO
2. Compute equal-weighted and VN-Index-adjusted returns
3. Report portfolio returns and long-short spread
This adapts the Chen, Hong, and Stein (2002) breadth test
specifically for Vietnam's foreign ownership component.
"""
df = metrics.copy()
df = df.sort_values(['ticker', 'quarter_end'])
# Change in foreign IO
df['delta_foreign'] = df.groupby('ticker')['pct_foreign_total'].diff()
# Forward quarterly return
df['fwd_ret'] = df.groupby('ticker')['ret'].shift(-1)
# Drop missing
df = df.dropna(subset=['delta_foreign', 'fwd_ret'])
# Quintile portfolios each quarter
df['foreign_quintile'] = df.groupby('quarter_end')['delta_foreign'].transform(
lambda x: pd.qcut(x, 5, labels=[1, 2, 3, 4, 5], duplicates='drop')
)
# Portfolio returns
port_ret = (df.groupby(['quarter_end', 'foreign_quintile'])['fwd_ret']
.mean()
.reset_index())
port_wide = port_ret.pivot(index='quarter_end', columns='foreign_quintile',
values='fwd_ret')
port_wide['LS'] = port_wide[5] - port_wide[1]
# Test significance
results = {}
for q in [1, 2, 3, 4, 5, 'LS']:
data = port_wide[q].dropna()
mean_ret = data.mean()
t_stat = mean_ret / (data.std() / np.sqrt(len(data)))
results[q] = {
'Mean Return (%)': mean_ret * 100,
't-statistic': t_stat,
'N quarters': len(data),
}
results_df = pd.DataFrame(results).T
results_df.index.name = 'ΔForeign IO Quintile'
print("Foreign Ownership Change and Future Returns (Vietnam)")
print("=" * 60)
print(results_df.round(3).to_string())
return results_df
# foreign_return_results = test_foreign_io_returns(io_metrics)33.11.2 Application 2: State Divestiture and Value Creation
def analyze_equitization_value(metrics: pd.DataFrame,
state_analysis: Dict) -> pd.DataFrame:
"""
Test whether reductions in state ownership are associated with
subsequent value creation (higher returns, improved governance).
Hypothesis: State divestiture reduces agency costs, improves
operational efficiency, and attracts institutional investors,
leading to positive abnormal returns.
Uses a difference-in-differences approach:
Treatment: Firms experiencing >10pp drop in state ownership
Control: Matched firms with stable state ownership
"""
df = metrics.copy()
events = state_analysis['equitization_events']
if len(events) == 0:
print("No equitization events detected.")
return pd.DataFrame()
# Get treated firms and their event quarters
treated = events[['ticker', 'quarter_end']].drop_duplicates()
treated['treated'] = 1
# Merge with metrics
df = df.merge(treated, on=['ticker', 'quarter_end'], how='left')
df['treated'] = df['treated'].fillna(0)
# Pre/post comparison for treated firms
treated_tickers = treated['ticker'].unique()
results = []
for ticker in treated_tickers:
firm = df[df['ticker'] == ticker].sort_values('quarter_end')
event_row = firm[firm['treated'] == 1]
if len(event_row) == 0:
continue
event_q = event_row.iloc[0]['quarter_end']
# Pre-event (4 quarters before)
pre = firm[firm['quarter_end'] < event_q].tail(4)
# Post-event (4 quarters after)
post = firm[firm['quarter_end'] > event_q].head(4)
if len(pre) < 2 or len(post) < 2:
continue
results.append({
'ticker': ticker,
'event_quarter': event_q,
'state_pct_pre': pre['pct_state'].mean(),
'state_pct_post': post['pct_state'].mean(),
'foreign_pct_pre': pre['pct_foreign_total'].mean(),
'foreign_pct_post': post['pct_foreign_total'].mean(),
'n_inst_pre': pre['n_inst_owners'].mean(),
'n_inst_post': post['n_inst_owners'].mean(),
'ret_pre': pre['ret'].mean(),
'ret_post': post['ret'].mean(),
})
if results:
results_df = pd.DataFrame(results)
# Paired t-tests
print("Equitization Value Analysis")
print("=" * 60)
for metric in ['state_pct', 'foreign_pct', 'n_inst', 'ret']:
pre_col = f'{metric}_pre'
post_col = f'{metric}_post'
diff = results_df[post_col] - results_df[pre_col]
t_stat, p_val = stats.ttest_1samp(diff.dropna(), 0)
print(f" Δ{metric}: {diff.mean():.4f} (t={t_stat:.2f}, p={p_val:.3f})")
return results_df
return pd.DataFrame()
# equitization_results = analyze_equitization_value(io_metrics, state_analysis)33.11.3 Application 3: Institutional Herding in Vietnam
def compute_herding_vietnam(trades: pd.DataFrame,
owner_types: Optional[List[str]] = None) -> pd.DataFrame:
"""
Compute the Lakonishok, Shleifer, and Vishny (1992) herding measure
adapted for the Vietnamese market.
Can be computed separately for:
- All institutional investors
- Foreign institutions only
- Domestic institutions only
The herding measure captures whether institutions systematically
trade in the same direction beyond what chance would predict.
"""
from scipy.stats import binom
t = trades.copy()
if owner_types:
t = t[t['owner_type'].isin(owner_types)]
t['is_buy'] = (t['trade'] > 0).astype(int)
# For each stock-period
stock_trades = t.groupby(['ticker', 'date']).agg(
n_traders=('shareholder_name', 'nunique'),
n_buyers=('is_buy', 'sum'),
).reset_index()
# Minimum traders threshold
stock_trades = stock_trades[stock_trades['n_traders'] >= 3]
stock_trades['p_buy'] = stock_trades['n_buyers'] / stock_trades['n_traders']
# Expected proportion per period
E_p = stock_trades.groupby('date').apply(
lambda g: g['n_buyers'].sum() / g['n_traders'].sum()
).reset_index(name='E_p')
stock_trades = stock_trades.merge(E_p, on='date')
# Adjustment factor
def expected_abs_dev(n, p):
k = np.arange(0, n + 1)
probs = binom.pmf(k, n, p)
return np.sum(probs * np.abs(k / n - p))
stock_trades['adj_factor'] = stock_trades.apply(
lambda r: expected_abs_dev(int(r['n_traders']), r['E_p']), axis=1
)
stock_trades['hm'] = (np.abs(stock_trades['p_buy'] - stock_trades['E_p']) -
stock_trades['adj_factor'])
stock_trades['buy_herd'] = np.where(
stock_trades['p_buy'] > stock_trades['E_p'], stock_trades['hm'], np.nan
)
stock_trades['sell_herd'] = np.where(
stock_trades['p_buy'] < stock_trades['E_p'], stock_trades['hm'], np.nan
)
# Time series of herding
ts_herding = stock_trades.groupby('date').agg(
mean_hm=('hm', 'mean'),
mean_buy_herd=('buy_herd', 'mean'),
mean_sell_herd=('sell_herd', 'mean'),
pct_herding=('hm', lambda x: (x > 0).mean()),
n_stocks=('ticker', 'nunique'),
).reset_index()
print(f"Herding Analysis ({owner_types or 'All Institutions'}):")
print(f" Mean HM: {stock_trades['hm'].mean():.4f}")
print(f" Mean Buy Herding: {stock_trades['buy_herd'].mean():.4f}")
print(f" Mean Sell Herding: {stock_trades['sell_herd'].mean():.4f}")
print(f" % stocks with herding: {(stock_trades['hm'] > 0).mean():.1%}")
return stock_trades, ts_herding
# herding_all, herding_ts = compute_herding_vietnam(trades)
# herding_foreign, _ = compute_herding_vietnam(
# trades, owner_types=[OwnershipType.FOREIGN_INST]
# )33.12 Conclusion and Practical Recommendations
33.12.1 Summary of Measures
Table 33.5 summarizes all institutional ownership measures developed in this chapter for the Vietnamese market.
| Measure | Definition | Key Adaptation for Vietnam | Python Function |
|---|---|---|---|
| IO Ratio | Inst. shares / TSO | Decomposed into state, foreign, domestic | compute_ownership_decomposition() |
| HHI Concentration | \(\sum w_j^2\) | Separate HHI for total, non-state, foreign | compute_io_metrics_vietnam() |
| ΔBreadth | Lehavy-Sloan adjusted | Applied to irregular disclosure intervals | compute_io_metrics_vietnam() |
| FOL Utilization | Foreign % / FOL limit | Vietnam-specific; no US equivalent | FOLAnalyzer |
| FOL Premium | Price impact of FOL proximity | Cross-sectional regression approach | FOLAnalyzer.estimate_fol_premium() |
| Trades | ΔShares (corp-action adjusted) | Critical: adjust for stock dividends | derive_trades_vectorized_vietnam() |
| Fund Turnover | min(B,S)/avg(A) | Semi-annual frequency; annualized | compute_fund_analytics() |
| SOE Status | State ownership > 50% | Tracks equitization program | analyze_state_ownership() |
| LSV Herding | \(|p - E[p]| - E[|p - E[p]|]\) | Separate foreign vs domestic herding | compute_herding_vietnam() |
| Co-Ownership Network | Shared institutional holders | Reveals conglomerate linkages | construct_stock_coownership_network() |
33.12.2 Data Quality Checklist for Vietnam
33.12.3 Comparison with US Framework
| Dimension | US (WRDS/13F) | Vietnam (DataCore.vn) |
|---|---|---|
| Disclosure | Quarterly 13F (mandatory) | Annual reports + event-driven |
| Coverage | Institutions > $100M AUM | All shareholders in annual reports |
| Ownership observed | Long positions only | Complete decomposition |
| IO can exceed 100% | Yes (short selling) | No (by construction) |
| Permanent ID | CRSP PERMNO | Ticker (with manual tracking of changes) |
| Adjustment factors | CRSP cfacshr | Must build from corporate actions |
| Investor classification | LSEG typecode / Bushee | State/Foreign/Domestic/Individual |
| Short selling | Not in 13F; exists in market | Very limited; not a concern |
| Unique features | — | FOL, SOE ownership, stock dividend frequency |