Hard: Financial Analysis Pipeline

This test evaluated the ability to design, implement, and execute a complete end-to-end data analysis pipeline combining data acquisition, feature engineering, statistical analysis, visualization, and interpretation. The task involved building a real-world application using the skinny projection pursuit index on financial data.

Overview

This test involved designing and implementing a complete data analysis pipeline: acquiring financial data, computing features, applying directed tours with the skinny PP index, and generating visualizations.

The pipeline demonstrates integration of multiple R packages, handling real-world data, and applying projection pursuit to financial pattern discovery.

Approach

The pipeline follows five steps:

Data acquisition: Retrieve historical price data using yahoofinancer
Feature engineering: Compute returns, volatility, correlations
Exploratory analysis: Run guided tours with skinny index as optimization criterion
Comparative analysis: Compare results across different PP indices
Visualization: Generate plots and animated tours documenting findings

Implementation

Code snippet — Full pipeline code available in Gist

symbols <- c(
  "BTC-USD", "ETH-USD",
  "AAPL", "MSFT", "NVDA",
  "TSLA", "AMZN", "GOOG", "META"
)

# Download prices and compute log returns
getSymbols(symbols, src = "yahoo", from = "2025-01-01", to = "2025-12-31")
price_matrix <- as.matrix(data[, -1])
returns <- diff(log(price_matrix))
X <- scale(returns)

# Define Skinny Index
skinny <- function() {
  function(mat) {
    cassowaryr::sc_skinny(mat[,1], mat[,2])
  }
}

# Render animated guided tour
render_gif(
  X,
  guided_tour(skinny()),
  display_xy(),
  gif_file = "finance_skinny_tour.gif",
  apf = 1/30,
  frames = 50
)

dev.off()

Results

Animated Tour Visualization

The guided tour generates a GIF showing how the projection evolves as the optimization searches for elongated point patterns.

Additional Highlight: Score Distribution Visualization

An enhancement to the pipeline is generating a histogram of skinny scores across 500 random projections. This provides insight into the distribution of PP index values and helps identify outliers.

# Search for best projection across 500 random basis
n_search <- 500
proj_list <- replicate(
  n_search,
  basis_random(ncol(X), 2),
  simplify = FALSE
)

scores <- sapply(proj_list, function(p) {
  proj <- X %*% p
  cassowaryr::sc_skinny(proj[,1], proj[,2])
})

# Visualize skinny score distribution
png("skinny_score_distribution.png", width = 800, height = 600)
hist(
  scores,
  breaks = 30,
  col = "lightblue",
  main = "Distribution of Skinny Scores",
  xlab = "Skinny Index Value"
)

Among 500 random projections, the skinny index produces a distribution of scores. The histogram shows the frequency of different index values.

Key Findings

Tech stock cluster: AAPL, MSFT, NVDA, META, and GOOG show correlated movement with similar arrow directions.
Crypto separation: BTC and ETH behave differently from equities, indicating distinct return dynamics.
Volatility patterns: TSLA exhibits higher volatility, causing separation from other tech stocks in projection space.
Daily returns structure: Most observations cluster near the center with occasional large movements at the periphery.

Interpretation

The skinny index successfully identified projections revealing latent structure in the financial returns. The elongated point patterns highlight correlations among technology stocks and the distinct behavior of cryptocurrencies relative to traditional equities.