Menu
  • Home
  • About
  • Blog
  • Thought Leader Interviews
  • Contact

Exploring IRS data with SQL in Google BigQuery

Manu Jeevan

November 20, 2018

24

Share this post

In this post, we will  perform an analysis on the IRS dataset.

We will be answering these two questions through our analysis:

  1. Largest employer per U.S. state per 2015 filing
  2. Calculate the `avg_income` (revenue – expenses) and sort the results by avg_income from highest to lowest. Calculate ‘avg_income’ for the years 2013, 2014, 2015 and 2016.

About the IRS dataset

Form 990 is used by the United States Internal Revenue Service to gather financial information about nonprofit/exempt organizations. This BigQuery dataset can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. For a complete description of data variables available in this dataset, see the IRS’s extract documentation: https://www.irs.gov/uac/soi-tax-stats-annual-extract-of-tax-exempt-organization-financial-data

Question 1:

Filtering using common table expression(CTE)  

First, I include ein (employer identification number), name, state and noemplyeesw3cnt (number of employees) in the SELECT statement.

Then I join irs_990_2015 table on the we are going to JOIN on the organizational details (irs_990_ein) table. I will use USING clause to as a key to join the tables.

If it’s the exact same name of the field across each different datasets, you can use the USING clause and it’ll save you just a little bit of typing.

Then I create a partition by state and order by the number of employees in descending order.

(PARTITION BY state ORDER BY noemplyeesw3cnt DESC )

RANK() OVER (PARTITION BY state ORDER BY noemplyeesw3cnt DESC ) AS rank

Now, I rank the records over this particular partition that I created in parentheses using a RANK() OVER function.

Now, we have to filter these records for the top ranked ones.

I am going convert the above query output into a temporary table using a WITH clause.  The WITH clause is called the common table expression.

Finally, I select all the records from this temporary table and filter only the records with rank 1. And then de-duplicate the records by using a group by clause inside the temporary table.

 

 

So it looks like A FAMILY AFFAIR BRIDGES INCORPORATED has  787,000 employees in California. And then we have CROSSBRIDGES,  KAISER FOUNDATIONAL HOSPITALS in Oregon.

Question 2:

I start by calculating the average revenue and expenses for the year 2015. Now, we want to include 2013 and 2014 too. I am going to use a table wildcard for that.

The table_suffix is exactly what that wildcard is replacing.  The great thing about table suffix, is you can now invoke that as a new field and potentially filter on it in your where clause as well.

I’m going to just execute the following query so you can see the value of table suffix.

The query failed because anytime you invoke something that’s not an aggregate function, you immediately need to have a group by.  So you can group by the index of the column, or you can actually group by the alias in this particular case.

The filing year is shown as 15, 13, 12, etc. because we hard coded this prefix: irs_990_20*.

I am going to concatenate the prefix in there.

I am going to use a WHERE to clause to only include the values from 2013 to 2015.

The query returned zero records because I mentioned ‘2013’ in the WHERE clause instead of mentioning ’13’. So, we have to keep in mind what that wildcard is set as.

I wrap the above query into a common table expression and calculate the average income.

 

The income of these charities have been growing from 2013 to 2015 but declined in the year 2016.

 

Comments

4

  • porno December 8th, 2020

    I believe you have observed some very interesting details , appreciate it for the post. Eveline Keenan Fanchan

    Reply
  • erotik film izle December 8th, 2020

    There is definately a great deal to find out about this issue. I love all of the points you made. Adelaida Kienan Sitnik

    Reply
  • erotik izle December 9th, 2020

    Some genuinely select articles on this internet site , saved to bookmarks . Alicea Hermy Tuchman

    Reply
  • erotik film izle December 9th, 2020

    You completed a number of nice points there. I did a search on the subject matter and found most folks will consent with your blog. Tommy Aldis Jarrid

    Reply

Leave a Comments Cancel Reply

Recent Posts

  • Exploring IRS data with SQL in Google BigQuery

    November 20, 2018
  • Common SQL JOIN pitfalls

    November 19, 2018
  • Exploring ecommerce data with SQL in Google BigQuery

    November 19, 2018

© 2016 [blog-link], All Rights Reserved.