Semantic Strategies

Have you ever wondered what separates the UK Regional Development Agencies? Are all the Regional Economic Strategies (RES) the same? Who would disagree with the promotion of higher incomes, investment in innovative technology and a cohesive communities? How do the unique assets and histories of each UK region affect its RES? How can every region pursue higher than average output?!

This analysis answers some of these questions by delving into semantic technologies. I present a visualisation of keywords (a tag cloud) and a statistical analysis of keywords (ngram ranking).

Tag Clouds

The information contained within is drawn from Regional Economic Strategies and the Regional Development Agencies' own websites. This data has been recorded on this site in a web-book on Regional Economic Strategies. Tags have been applied to excerpts of the text and then aggregated.

The RES book is not comprehensive but the analysis is illustrative even though it only includes a few of the strategies.

This tag cloud illustrates the frequency of economic development topics across the regions:

Advanced Technology Assets Attractive built environment Business City-Regions cohesive communities community communties Competition Competitive Advantage Conservation Culture Deprived Areas Dynamism Economy Education Efficiency Employment energy enterprise environment equality Global Competitiveness Growth GVA health inclusion Income Infrastructure Innovation International Economy Investment Inward Investment Jobs Knowledge land Learning Liveability People Physical Private Investment Productivity Quality of Life Regional Economic Strategy renewal Rural Safeguard Skills Social Justice Society Sustainable Development Sustainable Prosperity Towns Training Transport Unemployment Urban Work World Class

N-Gram Ranking

A n-gram is a collection of co-located words. "n" defines the number of words. In this analysis we compare pairs of words or "bigrams".

I've used perl modules from the N-Gram Statistics Package to parse the text. The script looks for pairs of words found within a four-word window. Some common conjunctions (and/ or etc) have been ignored. The frequency of bi-grams (memes in the RES) are counted in histograms. The histograms are then ranked by log-likelihood ratios.

The top ten bi-grams for each RES are as follows:

AWM EEDA EMDA LDA NWDA ONE SEEDA SWRDA YF
West Midlands East England East Midlands London s England average North East South East South West Yorkshire Humber
region s per cent sub area GLA Group Themed Chapters One NorthEast Regional Economic region s Northern Way
low carbon Regional Strategy region s ECONOMIC DEVELOPMENT Lake District set out Economic Strategy Regional Strategy such as
climate change Economic Strategy : PSA ECONOMIC STRATEGY Economic Strategy region s Regional Strategy Regional Economic long term
Regional Strategy Regional Economic Evidence Base DEVELOPMENT STRATEGY Regional Strategy ECONOMIC STRATEGY region s Economic Strategy e g
R D region s such as EVIDENCE BASE Economic 2006 REGIONAL STRATEGY Smart Growth southwestrda org region s
working age such as Local authorities Sustaining Success Strategy 2006 REGIONAL ECONOMIC quality life 2006 2015 REGIONAL ECONOMIC
such as R D long term such as page Themed Tees Valley sustainable prosperity Economic 2015 YORKSHIRE HUMBER
per head climate change Regional Strategy Sustaining Developing page Chapters such as New Action Strategy 2015 2006 2015

The list throws-up some interesting results

  • "Climate Change" is only a top 10 issue for AWM and EEDA (NB: just because these RDA mention this term frequently doesn't mean that other RDAs aren't as, if not more, concerned about the environment).
  • "The Northern Way", an agreement between the three northern RDA's, is only in YF's top 10.
  • "Long Term" thinking is only apparent in EMDA and YF's top 10.

The lists demonstrate that my method needs some work. Ideally we would like to remove "s" and stylistic elements like "such as" but I'm satisfied with the results for the time being.

The rankings of different RES documents are compared with a Spearman's Rank Coefficient. The correlation matrix of RES memes is as follows.

awm eeda emda lda nwda one seeda swrda yf
56.12% 56.19% 54.79% 54.78% 57.08% 55.22% 58.18% 57.66% awm
57.79% 57.41% 57.88% 60.08% 61.09% 57.60% 59.15% eeda
55.47% 54.46% 59.50% 57.90% 55.08% 60.00% emda
51.24% 58.64% 57.58% 54.19% 58.28% lda
55.76% 57.63% 52.97% 56.33% nwda
59.18% 56.99% 62.51% one
58.90% 58.96% seeda
54.89% swrda
yf

Since the correlation coefficients cluster around the "half-similar"-50% mark, I've prepared another matrix that subtracts .5 from each. The below matrix clearly shows that the most similar RES's are neighbours Yorkshire Forward & One North East and the South East & East of England Development Agencies. The least similar RES's are both distant cousins of the Northwest Development Agency, in the London and the South West.

awm eeda emda lda nwda one seeda swrda yf
6% 6% 5% 5% 7% 5% 8% 8% awm
8% 7% 8% 10% 11% 8% 9% eeda
5% 4% 10% 8% 5% 10% emda
1% 9% 8% 4% 8% lda
6% 8% 3% 6% nwda
9% 7% 13% one
9% 9% seeda
5% swrda
yf

Again this method isn't perfect. First, it would benefit from improved input data as described above. Second, we could extend the analysis to compare the correlation coefficients with a statistical significance test just to confirm whether the regions are similar/ different.

Moreover the N-Gram method itself isn't an means of comparison. Much of RES similarity is inevitable - all the documents are written to a similar brief. The analysis also prone to bias - similar writing styles do not always mean similar content. Indeed the N-Gram Statistics Package that I've used was not designed to compare different documents - the rank module is supposed to be used to compare the statistical ranking assigned by different measures - e.g. log-likelihood vs. pointwise mutual information.

Please let me know if you'd like me to continue with the analysis. Perhaps with the improvements I've hinted at, or by extending the research to cover the Welsh Development Agency or Scottish Enterprise, you may have your own recommendations. If you've got a more serious requirement, I'm open to commissions!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Images can be added to this post.

More information about formatting options