An Introduction To Open Data and the Socrata Portal
What I talk about tonight:
Overview of Open Data
Introduction to the Socrata Platform
Hands on demos
Whoami: Robert Harker.
Bay Area native.
- Born and raised in Palo Alto.
- Life long interest in urban planning/issues and bay ecology.
San Mateo resident/home owner.
Linux Systems Administrator.
- Early Sun Microsystems employee supporting Sun's corporate IT infrastructure.
- 30 years experience building the Internet.
Open Data Evangelist.
- Working as a volunteer on San Mateo County's Open Data portal
Trying to find "interesting" datasets
- I have a wide range of operational experience I can contribute.
Web technology, Internet protocols,
Data center server operations, DevOps software development process.
Please make an accommodation to my severe allergic reaction to Power Point
Slides prepared using vi and W3C Slidy,
A presentation system/template implemented in 6 file CSS styles sheets and one Java script.
What is Open Data?
Open Data is API based datasets stored in the cloud:
- Stored in the cloud as datasets hosted on Open Data portals
- Access via an Application Programming Interface, API
- Searchable
Open Data:
Technically open - Machine readable formats
Legally open - Anyone can re-use and re-distribute the data
Free - Free as in beer
Open Data is:
- Public data, anyone can use it how ever they want
- Typically licensed with a Creative Commons, CC, type license
- Interesting and boring datasets
Data is published as Open Data for:
- An organization's own internal use
- The public's use
Open Data Datasets
A dataset is a self contained collection of data.
- Including "metadata" about the dataset itself
Information is published in a dataset
- A dataset may be one or more sets of tabular data
- A dataset may contain non-text/non-numerical data:
Bitmaps, images, .pdf files, formatted binary data
- A dataset has metadata information about:
Description, owner and creation of the dataset
Definition of the column data types (schema) in the dataset
Open Data Datasets (Cont.)
Types of datasets
- Tabular data
- Map geo-location or shape data
- Structured documents
- Metadata
Open Data Portals
An Open Data Portal is
- An entry point into
a collection of datasets
An Open Data Portal defines:
- A base URL published by an organization
- An administrative domain
The County of San Mateo's Open Data portal is:
APIs are the future
The ability to use an app across many Open Data portals
- An out of town friend shows you a neat county government app
- You realize your county published the same data
- You change the portal from his county's portal to yours:
- Suddenly you are seeing your local data on his app
Like different apps that currently use Apple's ITunes.
- A different app lets you play your music in a different way
Problem: APIs are "User Hostile"
- Someone must write code
https://data.smcgov.org/resource/dataset?index_val=name&return_val=
And its even worse because Socrata names data sets as "abcd-wxyz"
https://data.smcgov.org/resource/abcd-wxyz?index_val=name&return_val=
Why are APIs powerful?
APIs are powerful because they define a clear interface
- Between the client (User) and server (Cloud)
Technology on either side may change
- But as long as the API works the same, no one is affected
What is an Application Programming Interface, API?
A HTTP based protocol for requesting information
- and getting results back as a HTTP response
An API is like a conversation:
- You ask a question. You get an answer.
I am querying the E911 dataset for the last 200 incidents
- Here are the records in a Comma Separated Version
- The first line are the names of the columns
Standards are important
Naming of datasets
Naming of data columns
Data field names and types
Metadata about the dataset
The Open Data Marketplace
Two major Players
- CKAN and Open knowledge (OKFN): founded 2010?
- Socrata: founded 2007
Hosted on virtual servers
Hosted on Cloud Infrastructure as a Server, IssS, providers
ArcGIS is not an Open Data Portal.
- They charge for their data.
Socrata
The County made a good selection choosing Socrata:
Dominant market share: 250+ customers including:
- Some Federal cabinet level departments and the State of California,
- San Francisco, Alameda and San Mateo Counties
Sound technology:
- Modern RESTful API interface supporting CSV, json and other data formats
- Cloud based distributed back end database
- User friendly web based user interface
- Useful tools provided for unsophisticated users to view and analyze the data
- Full support of i18n internationalization.
Socrata offers support
You can file Socrata trouble tickets:
- https://support.socrata.com
Use your County Open Data portal login to open a ticket
Socrata does not have forums
Instead they want you to use Stack Overflow and ask questions
- Limited success for me
- http://stackoverflow.com/questions/tagged/socrata
Socarata dataset publishing tools
Supports CSV and json format
Bulk upload, incremental update tools
End user upload tool if account has permission
Administrative tools to manage portal and accounts
Socrata web based User Interface (UI)
Socrata has a focus on easy to use web based tools
In Socrata a "view" is a formatted extract of the data:
- A map, graph, chart, or table
Formatting tools are available to work with a dataset or view
Tools are easy to use
- This means the tools are limited in what they can do
Allows everyday users to make useful views:
- The public
- Government/organizational staff
2 to 5 minute demos
A tour of the Socrata web based User Interface, UI
Make your first Socrata map
Make a mini map app
Socrata i18n internationalization
Instructions to follow along can be found on:
Discussion:
Questions?
Interesting datasets?
Thank You
Robert Harker
<harker@harker.com>