RTG Home  |   CIS Home  |   Penn Engineering  |   Penn

Extracting Traceable Formal Representations from Natural Language Regulatory Documents

Regulations, laws, and policies that affect many aspects of our lives are represented predominantly as documents in natural language. For example, the Food and Drug Administration's Code of Federal Regulations (FDA CFR) governs the operations of American bloodbanks. The CFR is framed by experts in the field of medicine, and regulates the tests that need to be performed on donations of blood before they are used. In such safety-critical scenarios, it is desirable to assess formally whether:

  • The regulation (CFR) is consistent, and
  • An organization (bloodbank) conforms to the CFR

The proposed research is to develop natural language processing (NLP) techniques for extracting formal representations from regulatory documents. These representations are then analyzed for consistency and used to determine an organization's conformance to the regulation. This is a collaborative effort between researchers in NLP and formal methods and aims at producing an environment in which policy can co-exist in natural and formal languages.