Showing posts with label data analytics. Show all posts
Showing posts with label data analytics. Show all posts

Tuesday, October 16, 2018

16/10/18: Data analytics. It really is messier than you thought


An interesting study (H/T to @stephenkinsella) highlights the problems with empirical determinism that is the basis for our (human) evolving trust in 'Big Data' and 'analytics': the lack of determinism in statistics when it comes to social / business / finance etc data.

Here is the problem: researchers put together 29 independent teams, with 61 analysts. They gave these teams the same data set on football referees decisions to give red cards to players. They asked the teams to evaluate the same hypothesis: are football "referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players"?

Due to a variation of analytic models used, the estimated models produced a range of answers, from the effect of skin color of the player on red card issuance being 0.89 at the lower end or the range to 2.93 at the higher end. Median effect was 1.31. Per authors, "twenty teams (69%) found a statistically significant positive effect [meaning that they found the skin color having an effect on referees decisions], and 9 teams (31%) did not observe a significant relationship" [meaning, no effect of the players' skin color was found].

To eliminate the possibility that analysts’ prior beliefs could have influenced their findings, the researchers controlled for such beliefs. In the end, prior beliefs did not explain these differences in findings. Worse, "peer ratings of the quality of the analyses also did not account for the variability." Put differently, the vast difference in the results cannot be explained by quality of analysis or priors.

The authors conclude that even absent biases and personal prejudices of the researchers, "significant variation in the results of analyses of complex data may be difficult to avoid... Crowdsourcing data analysis, a strategy in which numerous research teams are recruited to simultaneously investigate the same research question, makes transparent how defensible, yet subjective, analytic choices influence research results."

Good luck putting much trust into social data analytics.

Full paper is available here: http://journals.sagepub.com/doi/pdf/10.1177/2515245917747646.

Friday, October 9, 2015

9/10/15: Is Economics Research Replicable?… err… ”Usually Not”


An interesting, albeit limited by the size of the sample, paper on replicability of research findings in Economics (link here).

The authors took 67 papers published in 13 “well-regarded economics journals” and attempted to replicate the papers’ reported findings. The researchers asked authors of the papers and journals for original data and codes used in preparing the paper (in some top Economics journals, it is a normal practice to require co-disclosure of data and empirical models estimation codes alongside publication of the paper).

“Aside from 6 papers that use confidential data, we obtain data and code replication files for 29 of 35 papers (83%) that are required to provide such files as a condition of publication, compared to 11 of 26 papers (42%) that are not required to provide data and code replication files.”

Here is the top line conclusion from the study: “We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the 6 papers that use confidential data and the 2 papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors.”

In other words, even the authors of the original papers themselves were not able to put the results to re-test.

“Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.”

This is hardly new, as noted by the study authors. “Despite our finding that economics research is usually not replicable, our replication success rates are still notably higher than those reported by existing studies of replication in economics. McCullough, McGeary, and Harrison (2006) find a replication success rate for articles published in the JMCB of 14 of 186 papers (8%), conditioned on the replicators’ access to appropriate software, the original article’s use of non-proprietary data, and without assistance from the original article’s authors. Adding a requirement that the JMCB archive contain data and code replication files the paper increases their success rate to 14 of 62 papers (23%). Our comparable success rates are 22 of 59 papers (37%), conditioned on our having appropriate software and non-proprietary data, and 22 of 38 papers (58%) when we impose the additional requirement of having data and code files. Dewald, Thursby, and Anderson (1986) successfully replicate 7 of 54 papers (13%) from the JMCB, conditioned on the replicators having data and code files, the original article’s use of non-confidential data, help from the original article’s authors, and appropriate software. Our comparable figure is 29 of 38 papers (76%).”

A handy summary of results:














So in basic terms, economists are not only pretty darn useless in achieving forecasting accuracy (which we know and don’t really care about for the reasons too hefty to explain here), but we are pretty darn useless at achieving replicable results of our own empirical studies using the same data. Hmmm…

Sunday, October 4, 2015

4/10/15: Data is not the end of it all, it’s just one tool...


Recently, I spoke at a very interesting Predict conference, covering the issues of philosophy and macro-implications of data analytics in our economy and society. I posted slides from my presentations earlier here.

Here is a quick interview recorded by the Silicon Republic covering some of the themes discussed at the conference: https://www.siliconrepublic.com/video/data-is-not-the-end-of-it-all-its-just-one-tool-dr-constantin-gurdgiev.


Thursday, September 17, 2015

17/9/15: Predict Conference: Data Analytics in the Age of Higher Complexity


This week I spoke at the Predict Conference on the future of data analytics and predictive models. Here are my slides from the presentation:












Key takeaways:

  • Analytics are being shaped by dramatic changes in demand (consumer side of data supply), changing environment of macroeconomic and microeconomic uncertainty (risks complexity and dynamics); and technological innovation (on supply side - via enablement that new technology delivers to the field of analytics, especially in qualitative and small data areas, on demand side - via increased speed and uncertainty that new technologies generate)
  • On the demand side: consumer behaviour is complex and understanding even the 'simpler truths' requires more than simple data insight; consumer demand is now being shaped by the growing gap between consumer typologies and the behavioural environment;
  • On micro uncertainty side, consumers and other economic agents are operating in and environment of exponentially increasing volatility, including income uncertainty, timing variability (lumpiness) of income streams and decisions, highly uncertain environment concerning life cycle incomes and wealth, etc. This implies growing importance of non-Gaussian distributions in statistical analysis of consumer behaviour, and, simultaneously, increasing need for qualitative and small data analytics.
  • On macro uncertainty side, interactions between domestic financial, fiscal, economic and monetary systems are growing more complex and systems interdependencies imply growing fragility. Beyond this, international systems are now tightly connected to domestic systems and generation and propagation of systemic shocks is no longer contained within national / regional or even super-regional borders. Macro uncertainty is now directly influencing micro uncertainty and is shaping consumer behaviour in the long run.
  • Technology, that is traditionally viewed as the enabler of robust systems responses to risks and uncertainty is now acting to generate greater uncertainty and increase shocks propagation through economic systems (speed and complexity).
  • Majority of mechanisms for crisis resolution deployed in recent years have contributed to increasing systems fragility by enhancing over-confidence bias through excessive reliance on systems consolidation, centralisation and technocratic responses that decrease systems distribution necessary to address the 'unknown unknowns' nature of systemic uncertainty. excessive reliance, within business analytics (and policy formation) on Big Data is reducing our visibility of smaller risks and creates a false perception of safety in centralised regulatory and supervisory regimes.
  • Instead, fragility-reducing solutions require greater reliance on highly distributed and dispersed systems of regulation, linked to strong supervision, to simultaneously allow greater rate of risk / response discovery and control the downside of such discovery processes. Big Data has to be complemented by more robust and extensive deployment of the 'craft' of small data analytics and interpretation. Small events and low amplitude signals cannot be ignored in the world of highly interconnected systems.
  • Overall, predictive data analytics will have to evolve toward enabling a shift in our behavioural systems from simple nudging toward behavioural enablement (via automation of routine decisions: e.g. compliance with medical procedures) and behavioural activation (actively responsive behavioural systems that help modify human responses).

Saturday, October 12, 2013

12/10/2013: WLASze Part 1: Weekend Links on Arts, Sciences and zero economics

This is the first WLASze: Weekend Links on Arts, Sciences and zero economics for this weekend. The first instalment is on sciences, so a bit heavy on some topics. Enjoy.


Starting with a very very old stuff: according to the Russian researchers, the meteorite that exploded above a Russian city of Chelyabinsk (and on youtube screens) in February was about 4.56 billion years old, or as old as the Solar System itself.
http://en.ria.ru/science/20131004/183951992/Russian-Meteorite-as-Old-as-Solar-System--Scientist.html
Infographic with some details on meteorite impact is available here: http://en.ria.ru/infographics/20130215/179495177/Meteorite-Fragments-Hit-Russia.html


A cool, quick (and simple) list of top 5 most important physics discoveries of the last 25 years via BusinessInsider… oh and they throw in 5 future discoveries that are likely to change the world too:
http://www.businessinsider.com/top-5-modern-physics-discoveries-2013-10
My personal favourites: measuring the neutrino mass using Japan's Super-Kamiokande neutrino detector… archi-cool… and from the futures list - quantum computing…

While on physics and sciences - Nobel Prizes this year:
Chemistry: http://physicsworld.com/cws/article/news/2013/oct/09/chemistry-nobel-honours-trio-who-combined-classical-and-quantum-physics
Physics: http://physicsworld.com/cws/article/news/2013/oct/08/englert-and-higgs-bag-2013-nobel-prize-for-physics
Physiology or Medicine: http://www.theguardian.com/science/2013/oct/07/nobel-prize-medicine-cell-transport-vesicles
All worthy, in my view, unlike this year's Nobel Peace Prize. Peace Prize 2013 is a bit of a dodo, to be honest, just like some previous ones: http://www.businessinsider.com/12-worst-nobel-peace-prize-winners-2013-10. In this category in general, the Nobels are often given for uninspiring, bizarre reasons.
Literature Prize: also too often given for political reasons or for the reasons of obscure complexity and academism - was given this year to seemingly a worthy recipient: http://www.nytimes.com/2013/10/11/books/alice-munro-wins-nobel-prize-in-literature.html?_r=0

We are obviously holding our breath for Economics 'Nobel' - to be announced comes Monday. My best are in with a number of news outlets, but I'd rather keep them off the blog, as I generally prefer to avoid making predictions...


On a lighter (only slightly) scale of things: for aspiring physics fans: Physics World at 25 puzzle page: http://blog.physicsworld.com/category/physics-world-at-25-puzzle/


In continuation of the links I posted last week on the merger of materials sciences, human-tech interfaces and new tech development, here's an article about the latest discoveries in the metal composition are, showing shape-changing properties of metal crystal: http://www.bbc.co.uk/news/science-environment-24400101
And while on it: an article on 'smart' fabrics: http://www.bbc.co.uk/news/technology-20799344
And wearable tech: http://news.bbc.co.uk/2/hi/technology/7241040.stm
See my original links on the topic of 4D printing here: http://trueeconomics.blogspot.ie/2013/10/4102013-wlasze-part-1-weekend-links-on.html
These have now been incorporated into my talk about Human Capital-centric world and technological enablement which I will be delivering next comes early Monday at the Economic Forum / The Gathering-linked event in Ireland, hosted by the Irish-American biotech company, Alltech.


Talking of Irish researchers, we had some brilliant news out of TCD recently: http://www.belfasttelegraph.co.uk/news/local-national/republic-of-ireland/irish-scientists-in-solar-storms-breakthrough-29641467.html#sthash.p2oewCS2.BzoMB1YE.uxfs Basically, Trinity College researchers "have shown -- for the first time -- a direct link between solar storms, caused by explosions on the sun, and solar radio bursts, which cause the potentially dangerous communications disruptions on Earth."


The complex inter-relationship between observations, data collection and data analytics exemplified by TCD research mentioned above is, however, much more manageable than the data conundrums presented by ever-growing social data flows. Here is an excellent exposition of the problem http://www.wired.com/wiredscience/2013/10/topology-data-sets/
The problem is not the size of the data we are getting, but the "the sheer complexity and lack of formal structure". Put differently, and in comparative to physics: "“In physics, you typically have one kind of data and you know the system really well,” said DeDeo. “Now we have this new multimodal data [gleaned] from biological systems and human social systems, and the data is gathered before we even have a hypothesis.” The data is there in all its messy, multi-dimensional glory, waiting to be queried, but how does one know which questions to ask when the scientific method has been turned on its head?"

And a related article: http://www.wired.com/wiredscience/2013/10/big-data-science/

Stay tuned for arts posting later today.


Saturday, October 13, 2012

13/10/2012: Big Data, Fast Data, Protected Efficiencies


A very interesting (and revealing in terms of future strategies scope) set of videos from Goldman Sachs' Co-Head of Internet Banking (link here). Basic idea is that data analytics and efficiency drive - both via innovation - are the core sources of value in years ahead.

Main factors:

  • Data supply/generation - high volume of generated data is an opportunity (nothing new here, though) for 'bifurcation' of the business models that will grow and those that won't.
  • Proprietary data is the king
  • Internal human capital (knowledge) to capture this advantage is the king-maker
  • Enablement of this data via consumer-provider links is the battle ground

Frankly speaking, these are received wisdoms.

  • Capacity utilization drives efficiency
  • The rate of this drive/change/disruption is massive
Worth a watch!