Leading with commas — ugly or efficient? An investigation over 320 GB of SQL code | Hackernoon

Leading with commas — ugly or efficient? An investigation over 320 GB of SQL code;
Felipe Hoffa (staff?); In Hackernoon; 2017-07-26.
Teaser: Winning arguments with data: Let’s analyze 320 Gigabytes of open source SQL code to determine if we should use trailing or leading commas. Popularity is not enough — can we determine which style leads to success?
Felipe Hoffa, Developer Advocate @Google, San Francisco


# trailing commas
SELECT name,
FROM `employees`
WHERE state='CA'
# leading commas
  , company
  , salary
  , state
  , city
FROM `employees`
WHERE state='CA'


  • Categories
    • Leading Commas → fewest projects
    • Mixed Style → some more
    • Trailing Commas → the majority
  • Projects that allow a mix of styles show the most success.
  • Projects that enforce leading commas
    •  don’t show as much success as mixed
    • more successful than trailing
  • The trend is stable throughout the years  2016 & 2017.


presented as an image, not as data:


  • BigQuery
  • SQL is shown.


WITH comma_lines_per_files AS  (
    SELECT sample_repo_name, sample_stars_2016, sample_stars 
      , REGEXP_CONTAINS(line, r',\s*$') has_trailing 
      , REGEXP_CONTAINS(line, r'^\s*,') has_leading
      , line
    FROM `fh-bigquery.github_extracts.contents_sql`
      , UNNEST(SPLIT(content, '\n')) line
    WHERE line LIKE '%,%'
    AND LENGTH(line)>5
), stats_per_repo AS (
  SELECT sample_repo_name
    , MAX(has_leading) has_leading
    , MAX(has_trailing) has_trailing
    , ANY_VALUE(line) sample_line
    , ANY_VALUE(sample_stars) stars
    , ANY_VALUE(sample_stars_2016) stars_2016
    , (SELECT COUNT(DISTINCT actor.id) FROM `githubarchive.month.2017*` WHERE sample_repo_name = repo.name AND type='WatchEvent') stars_2017
    , (SELECT COUNT(DISTINCT actor.id) FROM `githubarchive.month.2017*` WHERE sample_repo_name = repo.name) actors_2017
    , (SELECT COUNT(*) FROM `githubarchive.month.2017*` WHERE sample_repo_name = repo.name) activity_2017
  FROM comma_lines_per_files
  GROUP BY sample_repo_name
SELECT COUNT(DISTINCT sample_repo_name) repos
  , IF(has_trailing, IF(has_leading, 'both', 'trailing'), IF(has_leading, 'leading', 'none')) commas 
  , ROUND(AVG(stars), 2) avg_stars
  , ROUND(AVG(stars_2016), 2) stars_2016
  , ROUND(AVG(stars_2017), 2) stars_2017
  , ROUND(AVG(actors_2017), 2) actors_2017
  , ROUND(AVG(activity_2017), 2) activity_2017
  , STRING_AGG(sample_repo_name ORDER BY stars DESC LIMIT 3) top_repos
FROM stats_per_repo
GROUP BY commas
ORDER BY repos
repos commas stars stars17 actors17 activity17 top_repos 571 leading 22.99 7.36 10.04 39.89 drone/drone,aspnetboilerplate/aspnetboilerplate,HazyResearch/deepdiv2847 both true 29.37 6.44 11.73 156.63 apache/spark,begriffs/postgrest,mybatis/mybatis-3  
5933 none false 20.05 4.8 7.57 54.43 ajaxorg/ace,zulip/zulip,fivethirtyeight/data  
69665 trailing false 13.06 3.22 5.49 43.68 Microsoft/vscode,rails/rails,kubernetes/kubernetes

SoK: Cryptographically protected database search | Fuller et al.

Fuller et al.; SoK: Cryptographically proctected database search; In Proceedings of IEEE Symposium on Security and Privacy (SP), 2017-03-06 → 2017-06-02; arXiv:1703.02014, IEEE.



Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, start-ups, and established companies.

However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases.

At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions:

  1. An identification of the important primitive operations across database paradigms. We find there are a small number of base operations that can be used and combined to support a large number of database paradigms.
  2. An evaluation of the current state of protected search systems in implementing these base operations. This evaluation describes the main approaches and tradeoffs for each base operation. Furthermore, it puts protected search in the context of unprotected search, identifying key gaps in functionality.
  3. An analysis of attacks against protected search for different base queries.
  4. A roadmap and tools for transforming a protected search system into a protected database, including an open-source performance evaluation platform and initial user opinions of protected search.



Bitglass, Ciphercloud, CipherQuery, Crypteron, IQrypt, Kryptnostic, Google’s Encrypted BigQuery, Microsoft’s SQL Server 2016, Azure SQL Database, PreVeil, Skyhigh, StealthMine, ZeroDB


Query Types

  • equality
  • boolean
  • range
  • other

Protection Types

  • legacy
  • custom
  • oblivious
  • other


  • Single table
  • With indices
  • Multiple tables


Legacy Schemes

Deterministic Encryption (DET)
preserves only equality but applying a randomized but fixed permutation to all messages.
Order-Preserving Encryption (OPE)
preserves the relative order of the plaintexts; range queries.
Mutable OPE
only reveals the order of ciphertexts; added interactivity during insertion and query execution.

Custom schemes

  • Inverted index schemes
  • Tree traversal schemes

Oblivious schemes

… aim to hide common results between queries.

Oblivious RAM (ORAM)
performance problems
latest type of ORAM
a second non-colluding server

Full database solutions

enables most DBMS functionality with a performance overhead of under 30%.
is built on top of MongoDB and reports a performance overhead of approximately 10%.
reports slowdowns of between 20% and 300% for most queries
EXT can occasionally beat a MySQL system with a cold cache (a somewhat strange comparison!), but are an order of magnitude slower than MySQL with a warm cache.
reports a 500% slowdown compared to a baseline MySQL system on keyword equality and range queries.


  • SummarizationSystematization of Knowledge (SoK)
  • Data Base Management System (DBMS)
  • searchable symmetric encryption
  • property preserving encryption
  • database search
  • oblivious random access memory
  • private information retrieval
  • Property-Revealing Encryption (PRE)




There are 162 references. As time moves on, check the IEEE, one day they will have the paper & its references.

Type System Criteria | Tim Bray

Tim Bray; Type-System Criteria; In His Blog; 2011-12-28.


Criteria · Let’s call them the Bánffy-Bray criteria for selecting between static and dynamic type systems.

  1. Static typing’s attractiveness is a direct function (and dynamic typing’s an inverse function) of API surface size.
  2. Dynamic typing’s attractiveness is a direct function (and static typing’s an inverse function) of unit testing workability.

Peta-Scale Data Warehousing at Yahoo! | et al.

Mona Ahuja, Cheng Che Chen, Ravi Gottapu, Jörg Hallmann, Waqar Hasan, Richard Johnson, Maciek Kozyrczak, Ramesh Pabbati, Neeta Pandit, Sreenivasulu Pokuri, Krishna Uppala; Peta-Scale Data Warehousing at Yahoo!; In Proceedings of SIGMOD; 2009-06-29; 7 pages.