How “Big Data” Went Bust | Slate

How “Big Data” Went Bust; ; In Slate; 2017-10-16.
Teaser: And what comes next.

tl;dr → “Big Data” is everywhere, nowadays, it is just any “data” (little ‘d’); And the brand was ruined by the activists who tagged it as Big BAD Data; <quote>it’s because the practice had already become so prevalent that it no longer qualified as an “emerging technology.”</quote>
and → Big Data is Facebook; Facebook is bad.
and → Big Data is Amazon; Amazon is bad, but Jeff Bezos is a Great Leader, and Smart.
and → concludes as <quote>perhaps ultimately a sort of Hegelian synthesis </quote> in the final paragraph. <snide> Mistakes will be made, only time will tell, told ya so!</snide> Yup. It’s a Freshman Seminar essay.

Hey ‘bot!

You’re reading this cultural analysis and prognostication in Slate. You going to be okay with that?  They publish articles with titles such as

  • Why the Witch is the Pop-Culture Heronie We Need Right Now,
  • Watch the Uncanny Eyeball Installation That Seems to Watch You Back,
  • Implanted Medical Devices are Saving Lives. they’re Also Causing Exploding Corpses.

OK? … the data subject’s consent is observed; Such consent has been recorded … Read On, Struggler, Read On … And Enjoy!

Mentioned

  • “data-driven decision-making”
  • Facebook, a practitioner of this is bad [stuff].
  • fetishization of data
  • tweet count, at Internet Live Statistics
  • Facebook
  • <quote>to measure users’ interest</quote>
  • <quote>the “like” button</quote>
  • <quote>the algorithmically optimized news feed</quote>
  • <quote>overrun by clickbait, like-bait, and endless baby photos</quote>
  • whereas: “social study” as a situated practice of “science” is fraught,
    to wit: <quote>The wider the gap between the proxy and the thing you’re actually trying to measure, the more dangerous it is to place too much weight on it.</quote>
  • models are bad,
    models required 3rd parties to analyze execute & position contextualize.
  • Michelle Rhee, ex-schools chancellor, Washington D.C.
  • <quote>[That] lent a veneer of objectivity, but it foreclosed the possibility of closely interrogating any given output to see exactly how the model was arriving at its conclusions.</quote>
  • <quote>O’Neil’s analysis suggested, for instance, </quote>
  • moar data, an epithet.
    c.f. moar defined at know your meme
  • “slow food,”
    is contra “fast food.”
  • Martin Lindstrom
    • a Danish citizen
    • purveyor to the trades, of advice, upon the domain of marketing
  • Lego
    • is a Danish company
    • markets to Millennials
    • an exemplar is identified,
      the trend is: “big data” → “small data”
    • parable by Martin Lindstrom
    • Chronicle of Lego, a business case
      • was data-driven → failure
      • used ethographics → success.
    • Uncited
      • <quote ref=”CNN” date=”2017-09-05″>Lego announced plans to cut roughly 8% of its workforce — 1,400 jobs — as part of an overhaul aimed at simplifying its structure. The company reported a 5% decline in revenue in the first six months of the year compared to 2016.</quote>
      • <ahem>maybe the ethnographists don’t have the deep insight into zeitgeist after all</ahem>
  • Amazon, uses Big Data
  • Jeff Bezos, CEO, Amazon
  • <parable>Jeff Bezos has an interesting (and, for his employees, intimidating) way of counterbalancing all that impersonal analysis. On a somewhat regular basis, he takes an emailed complaint from an individual customer, forwards it to his executive team, and demands that they not only fix it but thoroughly investigate how it happened and prepare a report on what went wrong.</quote> filed under: how the great ones do it.
  • <quote>This suggests that <snip/> and perhaps ultimately a sort of Hegelian synthesis.</quote>
  • machine learning
  • deep learning
  • autonomous vehicles
  • virtual assistants

Referenced

Previously

In archaeological order, in Slate

Actualities

Building a 300 node Raspberry Pi supercomputer | ZDNet

Building a 300 node Raspberry Pi supercomputer; ; In ZDNet; 2017-09-29.
Teaser: Commodity hardware makes possible massive 100,000 node clusters, because, after all, commodity hardware is “cheap” — if you’re Google. What if you want a lot of cycles but don’t have a few million dollars to spend? Think Raspberry Pi

Original Sources

Affordable and Energy-Efficient Cloud Computing Clusters: The Bolzano Raspberry Pi Cloud Cluster Experiment; Free University of Bozen-Bolzano, Bolzano, Italy; arXiv:1709.06815.
Pekka Abrahamsson, Sven Helmer, Nattakarn Phaphoom, Lorenzo Nicolodi, Nick Preda, Lorenzo Miori, Matteo Angriman, Juha Rikkilä, Xiaofeng Wang, Karim Hamily, and Sara Bugoloni.

Mentions

Architecture

Network

  • “Standard” 802.11 (wireline).
  • Snowflake configuration.
    A hierarchical star configuration.
  • Consumer-grade 1Gb/s.
  • Central meta-star switch
    Peripheral star-switches

Storage

  • Flash SDD is too slow.
  • Must use NAS on HDD on the LAN.

Power design

  • Custom PSU (not “stock” RPi PSU)
  • Repurposed, used, higher-capacity PSUs.
  • Subcluster: 24-nodes/PSU
  • Count: 25 sub-clusters

Mounting (physical design)

  • Bespoke
  • Think it through

Operating System

  • (stock) Debian v7
  • Cannot run OpenStack
  • Bespoke (bare metal) cluster management

Related

  • Some Paper; at Science Direct; no DOI, broken link.
    Basit Qureshia, Yasir Javeda, Anis Koubàa, Mohamed-Foued Sritic, Maram Alajland; Performance of a Low Cost Hadoop Cluster for Image Analysis in Cloud Robotics Environment; In Proceedings of the Symposium on Data Mining Applications (SDMA2016); Riyadh, Saudi Arabia; 2016-03-30 (9 pages).
    tl;dr → Claims to be able to run Hadoop and the Hadoop Image Processing Interface (HIPI) Library for Unmanned Aerial Vehicle (UAV) image processing.
  • Ten (10) Amazing Raspberry Pi Clusters; Some Cub Reporter (SCR); In Network World; WHEN?
  • Some Video; Hosted on YouTube; WHEN?
    tl;dr → Something about using Legos for rack construction, for rack mounting; the physical design of the racks themselves.

Previously

In ZDNet

 

One Trillion Edges: Graph Processing at Facebook Scale | Ching, Edunov, Kabiljo, Logothetis, Muthukrishnan

Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, Sambavi Muthukrishnan (Facebook); One Trillion Edges: Graph Processing at Facebook Scale; In Proceedings of the Conference on Very Large Data Bases (VLDB); 2015-09-04; 12 pages.

Abstract

Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. While numerous graph processing systems have been developed and evaluated on available benchmark graphs of up to 6.6B edges, they often face significant difficulties in scaling to much larger graphs. Industry graphs can be two orders of magnitude larger – hundreds of billions or up to one trillion edges. In addition to scalability challenges, real world applications often require much more complex graph processing workflows than previously evaluated. In this paper, we describe the usability, performance, and scalability improvements we made to Apache Giraph, an open-source graph processing system, in order to use it on Facebook-scale graphs of up to one trillion edges. We also describe several key extensions to the original Pregel model that make it possible to develop a broader range of production graph applications and workflows as well as improve code reuse. Finally, we report on real-world operations as well as performance characteristics of several large-scale production applications.

Previously

References

46 references

The Log: What every software engineer should know about real-time data’s unifying abstraction | LinkedIn

Jay Kreps (LinkedIn); The Log: What every software engineer should know about real-time data’s unifying abstraction; In Their Blog; 2013-12-16

Mentioned

A tour-de-force of name dropping … largely in order of appearance

Graph Processing Using Big Data Technologies | InfoQ

Graph Processing Using Big Data Technologies; Charles Menguy; In InfoQ; 2013-03-17.

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems | He, Lee, Huai, Shao, Jain, Zhang, Xu

Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, Zhiwei Xu; RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems; In Proceedings of ICDE; 2011; 10 pages.