Skip to main content

Please enter a keyword and click the arrow to search the site

Using the EDGAR log file data set



Publishing details

Social Sciences Research Network

Publication Year



The SEC's EDGAR log file data set is a collection of web server log files that allow researchers to study the demand for SEC filings. This multiple terabyte data set provides researchers with a direct measure of demand for financial reports, but the log files must be filtered to remove downloads by computer programs (or robots), and the sheer size of the files presents big data challenges. This paper compares three methods for counting human views in the EDGAR log files and aggregates the data on a filing-day basis so that it is accessible to desktop hardware and statistical analysis software. Overall, the three methods agree on the robot-human classification for 96 percent of users, but for sample 10-K filings, they can disagree by up to 27 percent. Download counts may be biased by up to 36 percent if multiple views by the same user are counted. Ryans's 2017 method eliminates multiple download counting and appears to effectively classify robots in cases of disagreement among the measures. The choice of measure may be particularly important when studying demand for Forms 10-K, 10-Q, 4, 13F-HR, as well as SEC comment letters. The aggregated data and sample code are available from the author.


EDGAR downloads; SEC filings; Demand for financial information; Investor attention; Big data

Series Number



Social Sciences Research Network

Available on ECCH


Select up to 4 programmes to compare

Select one more to compare
subscribe_image_desktop 5949B9BFE33243D782D1C7A17E3345D0

Sign up to receive our latest news and business thinking direct to your inbox