About BizPub.ai

A publication database for top business research journals.

The Problem

Existing business publication databases and ranking services share some common issues:

  • Outdated interfaces with poor user experience (“It comes down to taste”)
  • Unclear data collection and processing methods
  • Closed ranking algorithms that lack reproducibility
  • No open data access for research and community validation
  • Infrequent and irregular updates

The Challenges

Building a publication index sounds straightforward, but in practice there are many challenges, such as:

  • Licensed APIs won't work — Scopus and Web of Science have license restrictions that prohibit building competing indexes or redistribution of their data.
  • Open APIs are insufficient — CrossRef, OpenAlex, and Semantic Scholar have incomplete and mediocre-quality data for affiliations, keywords, and editorial roles. Full metadata requires crawling each publisher's site directly.
  • Messy affiliations — Raw strings are inconsistent, incomplete, or missing. Parsing them into structured fields is hard, and the same university appears under dozens of name variations that must be matched to canonical identifiers.
  • Non-article filtering — Editorials, errata, calls for papers, and award announcements must be separated from real research articles.
  • Author disambiguation — “Wei Li” could be five different researchers; “J. Smith” and “John Smith” at the same institution could be the same person.

Past Attempts

I've been trying to build this with my students for many years. It always fails the same way:

  • Student developers come and go — code goes stale, bugs go unfixed
  • Publishers change their sites and crawlers break
  • Rule-based parsing and matching algorithms are brittle
  • Manual data validation and correction is labor intensive and costly

This Project

BizPub.ai is my experiment in using AI agents to autonomously crawl, clean, and maintain the publication database—all with as little human intervention as possible.

The agents:

  • Analyze publisher websites to develop crawlers
  • Gracefully collect data without overloading publisher websites
  • Run automatic quality validation
  • Self-heal when publisher sites change
  • Call LLMs to parse and analyze data
  • Resolve GitHub issues reported by humans

The entire project is now maintained by me and AI. See the data process in detail.

About Me

I'm Harry Wang, a Professor of Management Information Systems at the University of Delaware and Founder of PaperFox.ai.

Sponsor

PaperFox.ai

Infrastructure and AI APIs for BizPub.ai are sponsored by PaperFox.ai — an all-in-one AI-powered platform for running academic conferences. Many of the technologies powering this project originated from PaperFox. Consider using PaperFox for your academic conferences to support this project.

Technology Stack

Claude CodeOpenAI APINext.jsPrismaPostgreSQLPlaywrightTailwind CSSshadcn/ui

Disclaimer

All publication data on BizPub.ai is collected from publicly accessible journal websites for research and educational purposes. We respect the intellectual property rights of publishers and authors. If you believe any content should be removed, please contact me.