Please use SQLite3 and the financial data provided in class(sp500_mafa2024.db) to estimate the ROA, Profit Margin, Asset Turnover, PPE (property, plant and equipment) to Total asse

Homework Assignment #2

1. Please use SQLite3 and the financial data provided in class(sp500_mafa2024.db) to estimate the ROA, Profit Margin, Asset Turnover, PPE (property, plant and equipment) to Total assets, Debt to Total Assets ratios for the information technology industry, and the consumer staples industry respectively from 2012 to 2022. Furthermore, please provide histograms of these ratios for each industry. You need to submit the python code and the histogram outputs.

2. Please explain how business models drive the characteristics of financial ratios for these two industry.

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Topic 3: Exploring data relations and SEC filingn", "n", "<span style="color:darkblue;text-align:center; background-color:lightgreen; opacity: 0.3"> Date modified: Jan, 2024; Code code: AF5123, MAFA </span>n", "n", "n", "### 1. Exploring data relations (basics)n", "n", "* In this notebook, we will further learn to use Seaborn library to explore the relation between two factors (i.e.,variables, or columns) n", "* Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.n", "* To have a complete understanding of Seaborn, please refer to the following link [Seaborn](https://seaborn.pydata.org/tutorial.html)n", "* Below is an example that relates to our course topic:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''n", "The objective of this code to plot the relation beween sales and SG&A for P&G inc.n", "'''n", "n", "# import libraries n", "import datetime as dtn", "import sqlite3n", "import osn", "#please change D:\Temp to your working directoryn", "os.chdir(r"D:\Temp") n", "import pandas as pdn", "import numpy as npn", "import matplotlib.pyplot as pltn", "import seaborn as sns #import seaborn for plotting datan", "%matplotlib inline n", "print("The most recent time to run this code: {}".format(dt.datetime.now()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* We have provided a data set ( *`sales_022024.db`* — **table name**: `PG_MSFT_30yr`) that includes:n", " * **sale**: sales n", " * **xsga**: SG&An", " * **cogs**: COGS n", "information since 1990 for P&G and Microsoft. n", "- Note: Please copy the data file (`sales_022024.db`) to your working folder." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conn=sqlite3.connect(r"D:\Temp\sales_022024.db") ##establish the connection with sqlite 3 databasen", "firms='0000080424' #P&G's cik coden", "sales=pd.read_sql('''select cik, fyear, datadate, sale, xsga, cogs n", " from PG_MSFT_30yrn", " where cik=?n", " ''', con=conn, params=(firms,)) #read sales, SG&A and COGS for P&G over the past 20 yearsn", "n", "conn.close()n", "## We still can plot scatter points without snsn", "def plot_scatter(sales, cols, col_x = 'sale'):n", " for col in cols:n", " fig = plt.figure(figsize=(7,6)) # define plot arean", " ax = fig.gca() # define axis n", " sales.plot.scatter(x = col_x, y = col, color='red', ax = ax)n", " ax.set_title('Scatter plot of ' + col + ' vs. ' + col_x) # Give the plot a main titlen", " ax.set_ylabel(col) # Set text for the x axisn", " ax.set_xlabel(col_x)# Set text for y axisn", " plt.show()n", "n", "num_cols = ['xsga']n", "plot_scatter(sales, num_cols)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sns.set() #default seaborn settingn", "#make a scatter plot with sns and indicate the pandemic periodn", "sales['COVID-19']=np.where((sales['fyear']>=2020) & (sales['fyear']<=2023), "YES", "NO")n", "sns.scatterplot(x='sale', y='xsga', data=sales, hue='COVID-19')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "##we use seaborn so that we can fit the data points in the linear regression modeln", "sns.regplot(x="sale", y="xsga", data=sales) #linear model – plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* <span style="color:darkorange;text-align:center;">Your turn</span>: please use *Microsoft* data since 1990 and provide scatter plots for the relations as follows: (1) the one between COGS and sales; n", "(2) the other between SG&A and sales; " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* You also can easily plot two companies on one figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conn=sqlite3.connect(r"D:\Temp\sales_022024.db") ##establish the connection with sqlite 3 databasen", "sales=pd.read_sql('''select conm, fyear, datadate, sale, xsga, cogs n", " from PG_MSFT_30yrn", " ''', con=conn) #read sales, SG&A and COGS for P&G over the past 20 yearsn", "n", "conn.close()n", "sns.scatterplot(x='sale', y='xsga', data=sales, hue='conm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. SEC EDGAR – crawling data filesn", "* SEC EDGAR provides an easy access to all the disclosures of the public companies in the US. Please See [SEC EDGAR Info.](https://www.sec.gov/edgar/searchedgar/accessing-edgar-data.htm) n", "You can use crawler files to download all the disclosures. However, please keep in mind that you only download the data you need. n", "* Let us download one crawler file and explore its structure n", "* Let us write a code to download information for 2023 (four quarters)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'''n", "The objective of this code is to download crawling information for 10-K files of SP500n", "'''n", "## import libsn", "import datetimen", "import osn", "os.chdir(r"D:\Temp")n", "import requestsn", "import pandas as pdn", "import sqlite3n", "headers = {n", " "User-Agent": "@@USE your own e-mail@@",n", " "Accept-Encoding":"gzip, deflate", n", " "Host": "www.sec.gov" n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "##Step 1: find out the list of SP500 (CIK)n", "conn1=sqlite3.connect(r"D:\Temp\sp500_mafa2024.db") #we create the SP500 list in Topic 2n", "sp_list=pd.read_sql('''n", " select distinct cik n", " from sp_list;n", " ''', con=conn1)#SP 500 cik listn", "cik_list=list(sp_list['cik'].astype(int)) #convert the framework to a integer listn", "n", "##Step 2: set up the period and prepare the weblink listn", "start_year=2023 n", "end_year=2023n", "years = list(range(start_year, end_year+1)) #create a list of years for downloadingn", "quarters = ['QTR1', 'QTR2', 'QTR3', 'QTR4'] #Quarter listn", "history=[] #create a list of tuples for years and quartersn", "for y in years:n", " for q in quarters:n", " history.append((y,q))n", "'''n", "An efficient way to create a list is:n", "history = [(y, q) for y in years for q in quarters]n", "'''n", "urls=[] #create a list of links n", "for x in history:n", " urls.append('https://www.sec.gov/Archives/edgar/full-index/%d/%s/crawler.idx' % (x[0], x[1]))n", "'''n", "An efficient way to create this list is:n", "urls = ['https://www.sec.gov/Archives/edgar/full-index/%d/%s/crawler.idx' % (x[0], x[1]) for x in history]n", "'''n", "# a standardized way to do the string formatting is:n", "#urls = ['https://www.sec.gov/Archives/edgar/full-index/{year:d}/{quarter}/crawler.idx'\n", "#.format(year=x[0], quarter=x[1]) for x in history]n", "urls.sort()n", "n", "## Step 3: start to download the crawler files with filteringn", "cur=conn1.cursor()#create a cursor object so that we can run SQL statementsn", "cur.execute('DROP TABLE IF EXISTS sp10K') #drop idx table if this table existsn", "cur.execute('CREATE TABLE sp10K (conm TEXT, type TEXT, cik INTEGER, date TEXT, path TEXT)') #create a tablen", "n", "for url in urls: #loop over the list of crawler filesn", " lines = requests.get(url, headers=headers).content.decode("utf-8", errors='ignore').splitlines() n", " nameloc = lines[7].find('Company Name')#locate the position of Company namen", " typeloc = lines[7].find('Form Type') #locate the position of File type n", " cikloc = lines[7].find('CIK') # Locate the positin of CIKn", " dateloc = lines[7].find('Date Filed') # Locate the positin of Date Filedn", " urlloc = lines[7].find('URL') # Locate the position of URLn", " records=[]n", " for line in lines[9:]:n", " records.append(tuple([line[:typeloc].strip(), line[typeloc:cikloc].strip(), line[cikloc:dateloc].strip(),n", " line[dateloc:urlloc].strip(), line[urlloc:].strip()]))n", " '''n", " The efficient way to create a list is as follows:n", " records = [tuple([line[:typeloc].strip(), line[typeloc:cikloc].strip(), line[cikloc:dateloc].strip(),n", " line[dateloc:urlloc].strip(), line[urlloc:].strip()]) for line in lines[9:]]n", " '''n", " for record in records:n", " if record[1]=="10-K" and int(record[2]) in cik_list:n", " cur.execute('INSERT INTO sp10K VALUES (?, ?, ?, ?, ?)', record)n", " print(url, 'downloaded and wrote to SQLite')n", "n", "conn1.commit()n", "conn1.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* With the URL list in the data table, we can download all the 10 K files for all the SP500 companies that filed annual reports in 2023n", "* Note: you need to review what we have done in Topic 1 (computing work)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }

Please use SQLite3 and the financial data provided in class(sp500_mafa2024.db) to estimate the ROA, Profit Margin, Asset Turnover, PPE (property, plant and equipment) to Total asse

Interested in getting help with this assignment?

Recent posts