Leading with commas — ugly or efficient? An investigation over 320 GB of SQL code | Hackernoon

Leading with commas — ugly or efficient? An investigation over 320 GB of SQL code;
Felipe Hoffa (staff?); In Hackernoon; 2017-07-26.
Teaser: Winning arguments with data: Let’s analyze 320 Gigabytes of open source SQL code to determine if we should use trailing or leading commas. Popularity is not enough — can we determine which style leads to success?
Felipe Hoffa, Developer Advocate @Google, San Francisco

Concept

# trailing commas
SELECT name,
  company,
  salary,
  state,
  city
FROM `employees`
WHERE state='CA'
# leading commas
SELECT name
  , company
  , salary
  , state
  , city
FROM `employees`
WHERE state='CA'

Summary

  • Categories
    • Leading Commas → fewest projects
    • Mixed Style → some more
    • Trailing Commas → the majority
  • Projects that allow a mix of styles show the most success.
  • Projects that enforce leading commas
    •  don’t show as much success as mixed
    • more successful than trailing
  • The trend is stable throughout the years  2016 & 2017.

Results

presented as an image, not as data:

Mentions

  • BigQuery
  • SQL is shown.

Actualities

#standardSQL
WITH comma_lines_per_files AS  (
    SELECT sample_repo_name, sample_stars_2016, sample_stars 
      , REGEXP_CONTAINS(line, r',\s*$') has_trailing 
      , REGEXP_CONTAINS(line, r'^\s*,') has_leading
      , line
    FROM `fh-bigquery.github_extracts.contents_sql`
      , UNNEST(SPLIT(content, '\n')) line
    WHERE line LIKE '%,%'
    AND LENGTH(line)>5
), stats_per_repo AS (
  SELECT sample_repo_name
    , MAX(has_leading) has_leading
    , MAX(has_trailing) has_trailing
    , ANY_VALUE(line) sample_line
    , ANY_VALUE(sample_stars) stars
    , ANY_VALUE(sample_stars_2016) stars_2016
    , (SELECT COUNT(DISTINCT actor.id) FROM `githubarchive.month.2017*` WHERE sample_repo_name = repo.name AND type='WatchEvent') stars_2017
    , (SELECT COUNT(DISTINCT actor.id) FROM `githubarchive.month.2017*` WHERE sample_repo_name = repo.name) actors_2017
    , (SELECT COUNT(*) FROM `githubarchive.month.2017*` WHERE sample_repo_name = repo.name) activity_2017
  FROM comma_lines_per_files
  GROUP BY sample_repo_name
)
SELECT COUNT(DISTINCT sample_repo_name) repos
  , IF(has_trailing, IF(has_leading, 'both', 'trailing'), IF(has_leading, 'leading', 'none')) commas 
  , ROUND(AVG(stars), 2) avg_stars
  , ROUND(AVG(stars_2016), 2) stars_2016
  , ROUND(AVG(stars_2017), 2) stars_2017
  , ROUND(AVG(actors_2017), 2) actors_2017
  , ROUND(AVG(activity_2017), 2) activity_2017
  , STRING_AGG(sample_repo_name ORDER BY stars DESC LIMIT 3) top_repos
FROM stats_per_repo
GROUP BY commas
ORDER BY repos
repos commas stars stars17 actors17 activity17 top_repos 571 leading 22.99 7.36 10.04 39.89 drone/drone,aspnetboilerplate/aspnetboilerplate,HazyResearch/deepdiv2847 both true 29.37 6.44 11.73 156.63 apache/spark,begriffs/postgrest,mybatis/mybatis-3  
5933 none false 20.05 4.8 7.57 54.43 ajaxorg/ace,zulip/zulip,fivethirtyeight/data  
69665 trailing false 13.06 3.22 5.49 43.68 Microsoft/vscode,rails/rails,kubernetes/kubernetes

Comments are closed.