Constellate is the text analytics service from ITHAKA (JSTOR and Portico). It is a platform for teaching, learning, and performing text analysis using archival repositories of scholarly and primary source content.
Dimensions Plus includes grants, publications, citations, alternative metrics, clinical trials, patents and policy documents. Must register with NetID and Password and email support@dimensions.ai to enable API access.
Elsevier's API program allows you to integrate content and data from Elsevier products into your own website and applications.API's are free for the products Princeton subscribes to: Scopus, Engineering Village, and subscribed journals in Science Direct.
Data for Research (DfR) provides datasets of content on JSTOR for use in research and teaching. Researchers may use DfR to define and submit their desired dataset to be automatically processed. Data available through the service includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR. Datasets are produced at no cost to researchers and may include data for up to 25,000 documents.
Lexis Nexis Web Services Kit is a mediated service that allows bulk download of Nexis UNI content (formerly Lexis Nexis Academic). Up to 250 documents and 1000 metadata downloads are allowable on Nexus UNI without use of the API. Contact your subject librarian for access to LexisNexis Web Services Kit.
Python tool for downloading/updating/maintaining a repository of all PLOS XML article files. Use this program to download all PLOS XML article files instead of doing web scraping.
Our mission is to provide rapid dissemination of scientific results at no cost to authors or readers. Providing free Application Programming Interfaces (APIs) helps us to advance that mission by enabling platforms and projects that extend the discoverability of arXiv e-prints and provide valuable services to scientists and interested readers.
We hope this list of APIs, bulk downloads, and tutorials will help you begin exploring the many ways the Library of Congress provides machine-readable access to its digital collections.
CORE provides a central API to access full content from tens of thousands of openly available scientific publications from thousands of OA repositories. Full datasets available by request.
Social media and the web
For data collection from social media, it is typical to use the publicly available APIs made available by the social media platforms, such as the following:
Access data from posts, threads, comments, users and more from reddit and subreddits.
Historical Reddit data has been collected at http://files.pushshift.io/reddit/ as monthly CSV downloads.
Public streams provide access to public data flowing through Twitter. Suitable for following specific users or topics, and data mining. You can also access single-user streams, containing roughly all of the data corresponding with a single user’s view of Twitter.
Access to business data, including location, photos, Yelp rating, price levels, hours of operation, and types of transactions. Also includes a Review API, which returns up to 3 review excerpts for a business.
The Congress.gov API includes bills, amendments, summaries, Congress, members, the Congressional Record, committee reports, nominations, treaties, and House Communications. Over time we will be adding hearing transcripts and Senate Communications. Sign up for a free API key to use.
Full text of United States Congressional Hearings (both House and Senate) 1824-2020 as extracted by ProQuest from its various Congressional hearings collections and delivered in bulk as XML files. Pre-processing completed by Politics Librarian, Jeremy Darrington, to extract individual hearing files, rename by hearing ID, and group into folders by decade. By accessing the data, you agree to abide by the included Terms of Use file. Read it thoroughly before use.
Bulk data downloads of major US Government publications including Congressional Bills, Commerce Business Daily, Federal Register, Public Papers of the Presidents of the United States, Supreme Court Decisions 1937-1975 (FLITE) and more.
Includes all official, book-published United States case law — every volume designated as an official report of decisions by a court within the United States. Scope includes all state courts and federal courts. Research scholars can qualify for bulk data access by agreeing to certain use and redistribution restrictions. You can request a bulk access agreement by creating an account and then visiting your account page.
Provides access to real-time documents, press releases, and social media posts from candidates for Congress and governor across the U.S. Options to compare candidates and groups (e.g. Senate Democrats vs. Republicans), filter by geography or demographics, and to generate term frequency charts and word clouds. (Princeton-subscription resource)
"Open data" is publicly available data that is structured in a way that enables the data to be fully discoverable and usable by end users. It can be freely used, reused and redistributed by anyone. Its value lies not only in what it does today, but also in what it can do in the future. It is a valuable national resource and a strategic asset to the federal government, its partners, and the public.