Framing: DPU Double Check is a web scrapping bot with two functions: 1) to compile a weekly list of all new dockets that do not have an Assistant Attorney Generals (AAGs) assigned on the case and; 2) to compile a weekly list of all briefs submitted by AAGs and all orders issued by the Department. The Department of Public Utilities (DPU) is the administrative body in charge of overseeing eclectic, natural gas, and water companies in Massachusetts. The Massachusetts Attorney General's Office has a division with a statutory mandate to serve as the "Office of Ratepayer Advocacy." (see Mass. Gen. Laws. Ann. ch. 12 ยง 11E). The Office of Ratepayer Advocacy (ETD) decides what cases to intervene in. But due to limited resources and subject matter, not every case before the DPU will have an AAG appearing in the case on behalf of ratepayers. When a new case is filed with the DPU, normally a copy of the filing is emailed to ETD. The concern, however, is that not every new filing will be emailed to ETD because parties do not always follow the procedural rules, and in some instances the parties filing a case are unaware of the procedural rules. If time passes before an AAG is assigned to appear in a case, the AAG may miss important deadline for issuing discovery, obtaining expert witnesses, or attending a hearing to cross-examine witnesses. Thus, it is important that AAGs are assigned to a case with as little time delay as possible. DPU Double Check provides a weekly report of new cases filed that do not yet have an AAG assigned to the case. Thus, ETD can rest assured that no case goes unseen. ETD is able to check this report weekly and decide whether to assign an AAG to the case. Additionally, because the Massachusetts Attorney General's Office is a large bureaucratic agency, upper management requires each division to submit a bi-weekly report summarizing any positions the office has taken, and any decisions handed down in cases where there is an AAG present. The DPU Double Check also provides a list of briefs submitted by AAGs on a weekly basis. The report also contains a list of orders issued by the department. The DPU Double Check maintains a running list of cases with an AAG in appearance, and specifically displays orders for these cases. If the DPU issues an order for a case that has no AAG, then ETD is not concerned with the order and do not include the order in the bi-weekly report. Because DPU Double Check is a new bot, it does not yet have a full, active list of all cases with AAG representation. As time goes on, this list will expand. In the meantime, the report produced by the bot includes all orders issued, and all orders issued where an AAG is present. Thus, the DPU Double Check solves the problem of identifying which cases do not have an AAG assigned (and therefore alerts ETD to make a determination as to if an AAG should be assigned to the case) and the bot quickly produces a report identifying the briefs and orders that need to be summarized for inclusion in the bi-weekly report, which is sent to upper management. The DPU Double Check saves ETD from manually locating this information, allowing ETD to spend more time representing the interests of rate payers. There are two users of the DPU Double Check. First is ETD Deputy Division Chief, Nathan Forster. Nathan Forster oversees assigning cases to AAGs and for compiling the bi-weekly report. As Deputy Division Chief, Nathan's skills are best utilized in representing the interests of ratepayers and ensuring that his division is operating efficiently. On Monday mornings he assigns new cases to AAGs. On Fridays he either creates a draft bi-weekly report, or finishes and submits the bi-weekly report. The second user is the Division's administrative staff. Currently, this means ETD Administrative Assistant Marlon Dos Santos. Marlon is tasked with helping with administrative tasks. One duty includes helping draft the bi-weekly report and list of cases that do not have an AAG in appearance. Marlon has more time to spend in the collection of this data, but as the divisions; only Administrative Assistant, he is often in demand for other priority matters. Research: There currently is no web scrapping bots for the DPU file room. There are, however, free web scrapping bots available online. These bots can either be free with minimal features that are unlocked after payment or have monthly fees. Some bots can be "trained" by using the mouse to highlight the information the user wants to be scrapped. APIFY is a solution that is a typical web scrapper service. The user inputs the keywords or data to be scrapped, and the bot runs its scrape code. The downside is that the free version only retains data for 7 days. This means that a running list cannot be maintained. Access to more features starts at $50/month and can get up to to $499/month. PasreHub is similar to APIFY. ParseHub offers 14 day data retention, but again this is not enough to maintain running lists. The premium versions start at $149/month and grow to $499/month. Octoparse is the most unique web scrapper on the market. This bot uses a point-and-click interface which easily allows the user to select the data it wants to be scrapped. The downside, however, is that the end user may not always be sure that the interface is collecting the relevant data. We do not get to see "under the hood" to determine if the code is sufficient for our needs. Further, the prices range form $75/month up to $209/month and we face the same data retention issues described above. The current method for scrapping the DPU file room involved manually searching and interacting with the file room. ETD staff must open each docket to confirm if an AAG is present on the case. And then must create a report with the docket information. For the bi-weekly, ETD must either manually review the file room or rely on emails from AAGs that contain the briefs they are e-filing with the DPU. This method involves human error and takes a significant chunk of time to gather. Ideation and Prototypes: The first ideation of the DPU Double Check involved two process. First, the bot would scrape the docket information of the most recently filed cases. Second, using the docket information, find the URL for the specific file. Third, scrape that specific URL to see if an ETD AAG was present. This would be an accurate way to check for new dockets with or without an AAG, but would be impractical for pulling information regarding briefs and orders filed (the info needed for the bi-weekly report). Further, the DPU File Room does not take the users to a specific URL, but rather the file room runs a report itself and then spits out the docket information. Determining the specific URL thus may not be in a uniform manner. The second ideation was to scrape the file room for new filings. The DPU file room will generate a report for all new filings made within a selected time period. Because this is a wide ranging report, the URL for the results is uniform, and the only variables in the URL is the date range. Thus, this portion of the file room would contain all new filings, all new interventions for AAGS, all new briefs filed by an AAG, and all new orders issued. To determine the data range that the bot should scrape, the original idea was to use QNA to elicit a custom data range from the end user. Thus, the end user would be able to specifically craft the date range that the bot should check. The benefits of this system is that when there are holidays or vacation periods, the user would expand the bot scrape range to collect the necessary data. The downside is that the user would have to wait for the bot to scrape, which the bot takes a considerable amount of time to scrape when the date range is longer than 7 days. Thus, the final idea was to automate the bot completely. Because the URL is uniform, the date of the URL could be set to Today, and 7 days prior to today. Therefore, the bot could be tasked to run weekly, and would automatically input the date range, inputting the date range as 7 days. Further, to send the results, the bot could create automatic triggers for when to send an email with the results. Prior ideations required QNA prompting the user if they would like to download a report, or send an email. With the automated solution, there is no hands-on needed. The DPU Double Check is able to run by itself, and send its results by itself. No interaction is needed. Here is how the final code works. On my desktop I have installed Anaconda. I also have created tasks to run BAT files with commands that run my code. The code runs on my desktop and updates a Google Sheet. The Google Sheet has a custom script enabled that sends an email of the Google Sheet as an excel file. The email is triggered to run Mondays and Fridays between 10am-11am. To ensure that a web scrape did occur and that the report emailed contains new data (because sometimes there are no new cases filed or briefs submitted) I have also scheduled a weekly reset of the google sheet. On Monday mornings and Tuesday, Wednesday, and Friday afternoons, I have a BAT file that runs python code that replaces last week's results with a default "error" message. This way, if the bot did not run, the email that is sent will contain rows that say "error: contact my creator" which will alert ETD that there was an error. The main error this accounts for is the Task Scheduler not running its scheduled task. Because the code is on my laptop, the laptop might have the lid closed when a scheduled run time is set, or the laptop might be dead. I have adjusted the conditions and settings of Task Scheduler so that the laptop is "woken" from sleep mode when a task is scheduled to run, and I have adjusted the battery settings to enable the laptop to be awoken by scheduled tasks, but I am not sure if this works. The current run times for all tasks are set for when I am usually using my laptop, but the default error messages will alter users if there is an issue. User Testing: User feedback was gathered using a questionnaire for feedback. Feedback was gathered from both Nathan Forster and Marlon Dos Santos, the two primary users. The feedback results are contained in this repo folder as "User Feedback 1.docx" (this is from Nathan Forster) and "User Feedback 2.docx" (from Marlon Dos Santos). Also, I have uploaded the email response to the feedback because it contained additional feedback. This is in the folder as "User Feedback Email.png" Refinement: Only 1 round of user feedback was conducted. But the feedback was incorporated. The original version ran on Wednesdays and created a report that contained both the list of cases without an AAG, and the new briefs and orders. After looking at the feedback, I updated the code to create two separate scrapes. One scrape runs on Mondays and the other on Fridays. I also updated the name of the sheets in the excel file to clarify what each sheet represents. I have uploaded "User Refinement Email.png" in which my partner and I discuss having the bot not be reliant on my laptop. For longevity sake it is not practical to run on my laptop indefinitely. Thus, my partner and I are working on a solution, though a solution has not yet been decided upon. I also updated the code to only send one email to ETD instead of 2. Intro Pitch: Delivered in Class. PowerPoint was messaged to you directly over Slack. Complexity: The code that I created for DPU Double Check is uploaded into this folder. "DPU_Double_Check.ipynb"Is the python code for Monday mornings, which checks if there are dockets that do not have an AAG "DPU_Double_Check_Friday.ipynb" is the python code for Friday mornings, which checks for new briefs or orders filed. "Error_Check_For_DPU_Double_Check.ipynb" is the python code that inserts the default error message, which enables users to confirm if a scrape was conducted or not "Email Script.png" is the Google Sheet script code, which allows the Google Sheet to be automatically emailed after selecting triggers within google sheets. "Email Script Trigger.png" is the triggers for the email script. (note if you want the text for the email script, let me know) "BAT Files.png" is a screenshot of the 3 BAT files, which tell my laptop to run the python files Impact: I'm still waiting on a feedback letter. But for now, I have attached screenshots from feedback that I have gotten throughout the course of the project. Please find the feedback I have now called "Feedback Screens.png", in additon review "User Refinement Email.png" and "User Feedback Emaikl.png" Fit: Please find the feedback I have now called "Feedback Screens.png". in additon review "User Refinement Email.png" and "User Feedback Emaikl.png" I also have uploaded an example report to this folder. The report is labeled "Example Report Friday.xls" I also have uploaded an example of what the default error looks like. This is labeled "Example Default Error.png" Documentation: Because my product is just an email with a report, I have written the body of the email to address typical user concerns. This email is uploaded to this file. It is called "Example Email.eml" and a text version is called "Example Email.txt" Real world viability: My bot is currently up and running. However, because my code runs on my laptop, I have identified 3 possible solutions. I am working with my partner to find which will be best. See attached file called "User Refinement Email.png" which is in this folder. *Update* Partner Letter is labeled "Screenshot_20191218-140755.png" and copies were emailed and slacked as well. If too late, completely understand. Biography: This concludes the DPU Double Check biography.