Joshua Fennessy

Two New Opportunities to Learn Hadoop and Spark

When I started learning Hadoop, the path to success wasn’t clearly defined. Since, at the
time, Hadoop hadn’t entered mainstream enterprise deployments, the use cases were as clear and the technologies to focus on wasn’t highlighted.

Today, things are a lot different.  Hadoop as been maturing at a steadhadoop-azure-logo-new_55d1639cy pace, and vendors like Microsoft are providing easily workable platform like HDInsight.

A couple of years ago, I scoffed at HDInsight as a half completed project, that wasn’t very deployable for real world solutions — my view as completely changed in the last 12 months due to the hard work and effort that Microsoft and Hortonworks have put into
growing the platform.

In it’s current state, HDInsight is a great platform to get started learning Hadoop, and because of this, I’ve put together a one day class to kick-start your training.

In this one day course, I focus on the important details you’ll need to get started building a solution with Hadoop.  After reviewing some real-world use cases that I’ve implemented using HDInsight, I take you on a tour of the environment.

About half to two-thirds of the day is focused on workshop style instruction — with lots of code examples to take home at the end of the day.  I zoom in on three of the major tools that are used with every project I work on: Hive, Spark, and Sqoop.  While there are other things to learn to be totally production ready, these are the three tools that I use with nearly every project I work on.

If Hadoop experience is on your list this year, but you don’t quite know where to get started, please consider joining one of my upcoming sessions this year. There are two choices available right now, with more to come!

Friday Aug 12, 2016 - SQL Saturday Indianapolis - Early Bird Price $100
Click here to purchase tickets

Thursday September 8, 2016 - SQL Saturday Cambridge (UK) - Early Bird Price £140
Click here to purchase tickets

Are you hosting a SQL Saturday event, or other community based training day and would like to include a full day session on Hadoop?  Connect with me on Twitter or LinkedIn and let me know what you would like to see.

Update #1 – Making the Case for Hadoop

A couple of weeks ago, I introduced you to an idea I had about testing the viability of Hadoop and/or Spark solutions for ETL.  I planned out a real-world type ETL problem, with what I hoped would be real-world scale.

Since that introduction, I’ve had the opportunity to work on the first part of the testing process, the SSIS execution.  Keep reading for details on the test plan and the results of the execution.

You’ll remember from the initial post that I’m looking to ingest a bunch of weather data from NOAA, perform some transformations on that data, and write it back to a SQL Server for later analysis.  This procedure mimics many ETL scenarios that are implemented every day.

For this test, I downloaded ALL of the NOAA_daily archive data — totaling nearly 90GB in CSV files  uncompressed.  Not HUGE data, but also, not a small amount of data. It’s enough that it should offer a good look into the comparison between SMP solutions like SSIS and MPP solutions like Apace Spark.

This CSV data is well formatted, and didn’t require too much cleansing for my purposes. One big task that I needed to complete was a pivot of the data to ensure that one output row corresponded with a single station/day combination. I also filtered the data down to the following measurements:

  • Total Precipitation
  • Total Snowfall
  • Min Temp
  • Max Temp
  • Avg Temp

I performed a second pivot to also store the time each of these events were recorded.  The SSIS package I created ended up looking something like this:

Screenshot 2016-04-09 17.01.56

The NOAA data is stored in a single file for each year starting with 1796 and ending with 2016. The SSIS package was set up to iterate through each file one at a time — a common method when ingesting lots of CSV files with SSIS.

The environment I executed this package on contained three individual machines:

1 ETL Server (Virtual)

  • SQL 2014 SP1
  • 4 vCores
  • 32GB RAM
  • Package deployed to SSISDB

1 File Server (physical)

  • All CSV files stored in a single directory
  • ~90GB uncompressed

1 SQL Server (Physical)

  • SQL 2014 SP1
  • 8 physical cores
  • 96GB RAM
  • RAID 10 array for logs
  • RAID 10array for data
  • Database instantiated with 100GB data and log files

Network

  • All machines connected with 1000 mbps LAN
  • Located in one rack

 

The Results

Honestly, when this package started, I thought it was going to move pretty fast!  The first years recorded in the NOAA data are quite small, and the package was flying through them.  Once it hit in the mid 1900’s, however, everything started slowing down.  Those files are over 1GB in size and the impact of the sorts (which is a blocking transformation) prior to each pivot and also again prior to the merge join really took a toll.

The final data landed in a table that looks like this:

Screenshot 2016-04-09 19.53.36

Since I executed this from SSISDB, I was able to monitor progress using the SSIS reports — they are a pretty handy tool in modern editions of SSIS.

Screenshot 2016-04-09 16.58.21

In the end — the package execution from end-to-end took 61645 seconds — which works out to about 17.1 hours to finish.  Honestly, I think that’s a bit longer than I was originally expected. My estimates were more in the 8 or 9 hour range.

The next step

Now that I have the SSIS package and testing complete, I’ll be moving on to the Spark development and execution.  In a couple of weeks, I’ll be sharing the results of that test. I’m really excited to see how it works out — I’m a bit nervous about the performance of Spark writing data directly to SQL Server. I’m pretty sure Spark will churn through the CSV files faster than SSIS, but I think the database writes are going to be a big bottleneck.

A Few Tips for Getting Started with Apache Spark

As, primarily, I’ve been a Microsoft SQL BI architect for the last few years, I struggle to call myself a developer.  Close to 20 years ago, I wrote Java, 15 years, .NET — but since then, the amount of code I’ve *really* slung have been pretty minimal.  Sure, I’ve written lots of SQL, and some scripting here and there — but it’s all been comparitively light.

For the past 3 years, I’ve been focused on Hadoop, and that’s required me to dip my toes into the development arena here and there — but still, I’ve primarily stayed in the scripting language camps — until now.

I’ve seen the light, and the light is Apache Spark — I’ve said it before, and I’ll probably say it many more times, Apache Spark is going to revolutionize how we build Big Data solutions, and how we approach Modern Data Warehouse projects in the future.

I’m really excited about Apache Spark, and I hope you are too.  But, if you’re like me, transitioning from a primarly SQL focused mindset to a programming framework like Spark isn’t going to be easy.  If you really want to be working with Spark, you’ll need to pick up some Scala.  Spark also supports Python and Java, but Scala is the de facto language of Spark — you can probably get away without learning it for quite a while if you know python — but I bet you’ll come across something that is going to require some Scala.

Let me be clear about something up front. I don’t know Scala.  At least, I don’t know enough of it to call myself proficient. Sure, I’ve written a couple of Spark applications — but in my mind, they are really simple, and I wouldn’t feel comfortable getting at Scala tattoo quite yet.

Tip #1 – Don’t be scared

With the latest release of Apache Spark, 1.6.1, you’ll have access to a mature DataFrames API — what does this mean? Well, for the most part, it means that the Apache Spark engine has matured enough to include a built in optimizer, so you don’t have to be a Scala guru to write well-performing code. Yes, all of you Scala masters, I realize that this may introduce some sloppy code to the ecosystem, but I think opening up the APIs to be accessible to a larger group of developers is a good thing. In my opinion, the more people we can get using Apache Spark, the better the world will be.

Secondly, using the Spark DataFrame API means that the code that you will write will be somewhat readable by a TSQL expert. It’s still Scala code, but since DataFrames don’t use lambda functions, it’s much more readable to someone that is used to writing SQL code.

Speaking of writing SQL code — if you wanted to, you could totally write your Spark DataFrame application in SQL if you wanted to — the optimizer engine will handle making sure that runs just as well as anything else.

Tip #2 – Ditch the IDE

When I first started looking at Spark, my first action was to download Eclipse. My second action was to stare at the Eclipse splash screen and think “well, what the hell do I do now?”  It was a mistake.

If you’re not already a Java or Scala developer, you don’t’ need an IDE to work with Spark when you are just learning; remember, learning Spark really means that you’re learning Scala too. Actually, I’d postulate that you will have a MUCH better experience learning if you ignore the need for an IDE and just focus on your data.  Getting to the IDE will come in time, and by the time you find you’ll need it, you’ll be comfortable enough with Spark and Scala that setting it up won’t be a big deal.

There are a couple of great options to interact with spark without building a custom JAR and spark-submit. The first one comes with your Spark installation: spark-shell.

Spark-shell is a command line tool that drops you into a Spark command prompt. The prompt will accept Scala line by line, and will evaluate each line as you enter it. This is great when you’re just learning the syntax.  Additionally, the spark-shell takes care of a bunch of housekeeping stuff — all of which you’d have to do on your own in an IDE.   The downside with using spark-shell is that the code you enter isn’t saved anywhere (except the shell history) so you may find yourself reentering lines of code between sessions.

Another great option is to use a notebook.  If you’re using a Hadoop image, like Hortonworks HDP Sandbox, then you’ll probably have access to Jupyter.  Jupyter is a web based tool for building living documents.  Basically you get a web page, with an open cell. In this cell you can enter in all sorts of code, Scala, Markdown, Python, SQL, bash, etc — and it will be evaluated and presented in browser.  Depending on your Jupyter installation, you may also have access to a web-based terminal, allowing direct machine access into your cluster.

Tip #3 – Focus on the right content

The Spark API is not small — there are multiple paths to look at.  Based on a recent class I attended at Strata+Hadoop World in San Jose, My recommendations are:

  • Spend 95% of your learning time on DataFrames. This will get you exposed to the DataFrame API and Spark SQL. Use SQL when you can — the Spark execution engine will handle optimizations for you.  RDDs are not where your time should be focused. While DataFrames don’t support lambda functions (a function executed for each row of data in the set), you can use DataSets to use that functionality with DataFrames without resorting back to RDDs.  RDDs are considered too low level for most developers to need to worry about at this point in Spark’s maturity
  • Use HiveContext, org.apache.spark.sql.hive.HiveContext, instead of the Spark SQL context.  The Hive SQL parser is better than Spark’s. You don’t need to be running Hive to use the Hive Context, it will work without it just fine. Unofficially, the rumor is the Spark SQL context in it’s current form will go away and be replaced with HiveContext in the future.
  • Marketing loves to talk about Spark Streaming, but in practice, it’s not quite yet mature. Look at it, but don’t focus on it as the only streaming solution available. Spark’s bread-and-butter use case is still ETL and batch data processing.

Tip #4 – Work through tutorials and get to know the documentation

There are many Spark tutorials available — all of the major Hadoop vendors, Hortonworks, Cloudera, MapR, have Spark tutorials to work through. They are fine places to start and get to use some data.  But the real content to focus on is the API documentation  Spark has a really good documentation, and the API docs will be very helpful as you explore how to use DataFrames to process data.

Additionally, make a bookmark for spark-packages.org. As you work with different types of data, you’ll want to often check here to make sure that the activity you’re trying to figure out hasn’t already been done by someone else.  There is a vibrant community at spark-packages, chances are that you’ll find what you need there.

Tip #5 — Don’t give up!

If you’re not a developer (like me), it will be hard to get started, but keep at it!  Spark isn’t nearly as daunting as it looks at first glance — follow these tips above and have a better experience than I did when I got started.

Making the case (or not) for Hadoop and Spark

For the past several years, we’ve been inundated with messages about how Hadoop and similar technologies, most recently, Spark will make our data processing jobs so much faster. There have been many controlled tests, reverse-sorting results, and  benchmarking tests that show that open-source-project-A is magnitudes oSpark-logo-192x100pxf orders faster than open-source-project-b, etc.

But, what about real world usage?  That’s exactly what I’m looking to find out. I’ve been told, and I HAVE told customers of mine that processing large data sets in a distributed environment is faster, but I’m not often given the opportunity to spend billable hours building two solutions to provide actual proof that those cases where Hadoop really is faster exist.

And, let’s be honest, it’s often a hard story to stand behind, especially when a customer loads up a bunch of data in Hive and runs a query, then waits for minutes (or even hours) for that data to be returned; the same query of course, runs in seconds on SQL Server…boy does that make me look like a fool (thanks a lot, Hive!)

What’s the plan?

The plan, executed over the next few weeks, is to build a two ETL solutions, one on SMP SQL Server using SSIS, and one on Hadoop/Spark that meets the same requirements.

Why SQL Server?

Traditionally, I’m a BI guy. I deal with Data Warehouses (Marts) most of the time.  I rarely work on OLTP systems. Most of my customers are working on Microsoft based systems using SQL Server and SSAS.  So, because I want to be able to use this as a story for BlueGranite’s customers, I’m choosing to use a common system

Why Hadoop and Spark?

I’ve been focusing on Hadoop for nearly 3 years now — as a company, BlueGranite, as been culturing a budding Big Data practice for two years.  We’ve seen massive growth of this practice in 2016 already — and it’s showing no sign of slowing down.  Hadoop is maturing to a level that our customers are ready to trust it. Spark is also maturing and is very quickly taking over the batch processing activities in Hadoop.  I’m making it my goal to write zero PigLatin in 2016 (and beyond)

Did I just say “requirements” to myself?

Yup. I did.  Requirements are important for any project.  For this effort, I’m going to be using a dataset available from NOAA. Specially, then NOAA daily dataset. Why this one?  Well, frankly, it’s a pretty easy dataset to work with, and for this test, I’m not concerned with the complexity of the data processing.  It’s also probably big enough — it’s about 24 GB compressed, about 2.4 billion rows. I think that should be big enough to show a difference between SMP and MPP processing.

Ok, identifying the dataset is great, but those aren’t requirements.  Right, well, here are my requirements:

  • Source data is stored in delimited files
  • The data needs to be stored as a single row per station per day
    • Measurements should be stored as columns for easy consumption with visualization tools
  • The resulting dataset need to include geographical information
    • It’s weather, so a map visualization kind of makes sense
  • Time series analysis must be possible with the resulting dataset
    • I’ll want to be able to do Year over Year comparisons, etc
  • The data must end up in a SQL Server table

What about environment, where am I going to be building this stuff?

I bet you think I’m going to say the Cloud, right? Well, I normally probably would, but in this case, I’m not.  A couple of reasons why:

  • I want to test this on bigger hardware than my meager Azure credit can afford
  • I happen to have this hardware in my physical home office, and am excited to have a use case for it🙂

I don’t have a huge server farm in my room, but I have a few decent boxes that I’m going to be using for this

SQL Server

My SQL Server is going to be running on a 16 core/96GB RAM TrendMicro server that I have running. It has about 6TB of disks configured into a couple of RAID 10 arrays.  It won’t be completely production compliant, but I’ll be able to at least have data and logs on different spindles, and have enough space to allocated database appropriately.  This machine will also run the SSIS ETL

Hadoop/Spark

My Hadoop cluster is made up of two other servers, first is a smaller Dell R410 (8 core/64GB) that runs a virtualized namenode, and a virtualized MySQL database server for Hadoop metadata.  My data nodes are virtualized and running on one of the ill-fated Dell QuickStart Data Warehouse machines.  I normally would try to have separate physical data nodes, but this server allows for some interesting Hadoop virtualization.  This QSDW box is a 16 core/96GB server with 4NICS and 27 500GB 10000RPM SCIS drives — why is this good for Hadoop? Well, it means I can run 3 data nodes, with each having a dedicated NIC, and 6 dedicated drives for HDFS. I can get true parallel HDFS reads/writes with virtualized servers. While not as good as true physical servers, it ends up to be a pretty good environment to run Hadoop on.

What sort of timeline is this going to happen in?

Great question — I hope to be able to start when I get back home in early April.  I’m travelling around quite a bit right now, so I imagine it will take me a few weeks to complete all of the development and testing.

What will I find out?

I don’t know! That the best and/or scariest part.  I might find out that SSIS/SQL Server is way better for processing GB of files. I may find out that we should run Hadoop on our phones because it’s so great!  I really have very few expectations. I do feel like the Hadoop/Spark solution is going to be faster than the SQL solution for this case — but not enough to bet on it.

5 Questions with David Klee

I recently had an opportunity to sit down with David Klee (blog | twitter | linkedin), SQL Server MVP, VMWare V-Expert, and all around nice guy.  David

David Klee

and his family are currently undergoing a cross-country move — so I greatly appreciate the time he was able to spend in his response.

Without further hesitation, here’s David:

Josh: What new technology are you most excited about right now?

David: Oh wow where do I start? My biggest challenge is that I want to learn everything about everything. From a technology side, I’m most excited about the adoption of flash storage into the datacenter. Used right, it has hte power to completely change the performance characteristics of any SQL Server. It can be used for I/O caching underneath them. Workload characteristics can completely changed by utilizing it. Licensing costs can be reduced through using it wisely. Business problems can be solved with it. The impact to the infrastructure itself, from elevated CPU consumption, to storage interconnect utilization, and the challenges of adjusting the way SQL Server is *using* storage to make the system even more efficient and perform better, is amazing. I want to learn more about it!  (Yeah, I’m a serious geek, I know.)

Josh: We all have a million things to do. How do you organize your tasks?

David: I’m a huge proponent of lists. If I didn’t have my lists, I’d miss so many things that need to get done that I’d probably fall apart. I have three lists – to do today, to do this week, and to do soon. The lists are good old pen and paper. Every morning I re-write each list so that I can re-prioritize what’s most important, add things that have come up, and remove things that I got done yesterday. Re-writing it makes me think about it as I write it, and I can re-prioritize as I write. I then use the list of things to do today to help me manage my time for that day. Come tomorrow, rinse and repeat.

Josh: What new things are you learning?

David: I’m starting to work on learning some advanced statistics for high performance computing and data visualization techniques. Every IT platform and system, databases included, have a number of performance challenges. Measuring them is just the start of the process to help get the meaning of the data out of the raw numbers, because most folks just live with “it’s slow” and have trouble putting objective metrics to subjective observations. Even if they have performance monitoring systems out there, many of the utilities today provide a watered-down view of the data that removes much of the meaning of the data. Showing correlation of system resource consumption and the visual impact on other systems (and not just saying “it’s slow”), is an area that is coming even more critical in today’s datacenter infrastructures but an area that is rarely performed in a meaningful manner. I’m working to fix this!

Josh: Out of all of the training events and/or conferences you’ve attended which has left the most impact?

David: The PASS Summit conference from 2012 was life changing, more so than any other technical or training event that I’ve ever had. I was fortunate enough to co-present a session called “Managing SQL Server in a Virtual World”  with one of my mentors, the legendary Kevin Kline, to a packed room of over 500 people. Having never spoken at any conference before before, and being a novice in the SQL Saturday speaking scene, Kevin had sat in on my very first SQL Saturday session ever in October of the previous year. After that, I started to get to know him through the various events where I bumped into him, and that email about the session changed everything. I gained my “sea legs”, so to speak, and the confidence that he had in me to do a good job in the session was enough to give me the confidence to say yes and to actually do it. The session came and went, and I had an absolute blast and people seemed to get a lot out of it!

Since then, the confidence gained from that session with Kevin, and the thrill (strangely enough instead of fear) of speaking in front of that number of people made me want to keep it up. I’ll never assume that I’ll get picked for any other event, but I’ve at least got the confidence to submit and know that I can present the content so that others can learn from my experiences. I’ll always be indebted to Kevin for giving me that first chance at the big leagues!

Josh: Was does SQLFamily mean to you?

David: The SQLFamily means quite a bit to me, and has some very personal meanings and impact to me personally and professionally. The worldwide SQL Server community is arguably the most tight-knit and most warmly inviting group technology enthusiasts in the world. Quite a while back, I started attending the SQL Server Users Group in Omaha, after having tried some other user groups. The other user groups were not especially cold, but I just did not feel welcomed, and the mindset was that while they wanted to share certain aspects of the technology, they just were not really passionate about it. To me, technology is not just a job. It’s a lifestyle. It’s my professional livelihood. It’s my primary hobby that I pursue nights and weekends.

At the first SQL Server users group, I spent two additional hours after the user group wrapped up talking shop with the people who stuck around. I got an email the next day asking me if I was going to be at the next one. At the next meeting, the camaraderie was even stronger and I was asked if I could go to lunch with a group just to hang out. It just grew from there.

To be a bit cliche, the thought that hit me was “I’m home.” Every time I go to a SQL Saturday or PASS Summit, it feels like a family reunion (and I mean that in a good way). This is a group of people who are as passionate about technology as I’ve always been, and have the same mindset of being the best technologists they can be. I’m very proud to be a part of it, and encourage everyone that I meet to become a part of the family.

I’ll put it this way… if it were not for the SQLFamily, I would not have started my own company. Period. This community has supported my dream of starting my own business in ways not possible with other groups. I have some of the best business and personal mentors on the planet from this group. I’ve gotten enough work from word of mouth from this community so that I can pay the bills, and am now scouting for technical people at the various events that I go to so that I know who I want to hire when the time is right. It’s been an amazing journey over the last couple of years, and the SQLFamily made it possible. I want to give back to this community as much as I’ve gotten from it.

Introduction to Hive Complex Data Types – Part 1: Array

In addition to supporting the standard scalar data types, Hive also supports three complex data types:

  • Arrays are collections of related items that are all the same scalar data type
  • Structs are object-like collections wherein each item is made up of multiple pieces of data each with its own data type
  • Maps are key/value pair collections

This first of three related articles will introduce the Array data type that Hive offers.

Hive supports single dimension arrays which are specified during table creation. For the examples in this article, the following table definition will be used:

Defining Array data types and loading data

CREATE TABLE Products
(
id INT,
ProductName STRING,
ProductColorOptions ARRAY<STRING>
);

A simple INSERT statement can be used to get some sample data in the table. Note the ARRAY function is used to load complex data types into the table.INSERT INTO TABLE default.products

SELECT 1, 'Widgets', array('Red', 'Blue', 'Green')
UNION ALL
SELECT 2, 'Cogs', array('Blue', 'Green', 'Yellow');

In a production environment it’s common to use portable formats like JSON to store array values. If TEXTFILES are used for source data storage, the table can be defined using the COLLECTION ITEMS TERMINATED BY option to specify how the array values should be split.

Querying tables with Arrays

A simple data query can be written that will return all values of the array in a single row:

SELECT id, productname, productcoloroptions FROM default.products

Screen Shot 2015-07-08 at 11.23.14 PM

Specific positions can also be referenced when using an array data type to return a single value of the array:

SELECT id, productname, productcoloroptions[0] FROM default.products;

Screen Shot 2015-07-08 at 11.24.51 PM

Finally, the entire array can be flattened using the LATERAL VIEW EXPLODE statement.  This statement returns a single row for each array value similar to how CROSS JOIN works. The syntax for the query is below

SELECT
p.id
,p.productname
,colors.colorselection
FROM default.products P
LATERAL VIEW EXPLODE(p.productcoloroptions) colors as colorselection;

Screen Shot 2015-07-08 at 11.27.26 PM

Conclusion

Arrays are powerful data constructs that are often found in LOB applications. With Hive, the array does not need to be broken apart in ETL/ELT process. Native support for arrays is a powerful feature that separates HiveQL from standard T-SQL implementations.

5 Questions with Jes Borland

One of the great things about the SQL Server community is the open communication we maintain on Twitter. Jes Borland, a friend across Lake Michigan, mentioned that she was looking to kick-start her blog again and wanted to maybe interview someone. I had been wanting to start an interview ‘pen-pal’ club as well, so we connected and made the plan.

Jes Borland (blog | twitter | linkedin) is a Microsoft SQL Server MVP, founder of the Tech-on-Tap Training Series, blogger at LessThanDot.com, and SQL Server Engineer at Concurrecy. I recently asked Jes a few questions about work, life, and career. Here are her responses.

Jes Borland
Josh: You know that moment when your brain freezes up and you just can’t continue?  How do you get past it?

Jes: I take a walk (or, if it’s really bad, go for a run). If I continue to sit and my desk and stew on it, I’ll end up going over to Twitter and wasting half an hour, or getting lost on some news site. That doesn’t help clear anything up, and then I feel as if I’ve wasted time. But, if I can get away from the computer, move, and think through what I’ve been doing, it helps – a lot!


Josh: We all have a million things to do. How do you organize your tasks?

Jes: I love lists. Love, love, love. But instead of making un-ending lists, I have two. Every day I make a “Three Things” list for the three things I need to do that day. Three items is manageable. Three things is progress. Three things are not overwhelming. Then I have my “For Later” list, which lists other things I”ll work on as time allows.

Josh: If you could change one thing about your office, what would it be?  What if you could change two things?
Jes: If I could change one thing…I’m on the lookout for a new desk. I want a lot of things out of it, though. I’d like it to be an adjustable sit-down/stand-up/walking desk. I’d also like it to have room for two laptop/dock setups (one for work, one for personal). Then, I need drawers. I hate having things sitting out on the desk where I can see them, unless I”m actively working on or with them, so drawers are necessary.

If I could change a second thing about my at-home office, I’d want it to be a separate room above the garage, with its own bathroom and a small kitchenette. After over three years of working from home, I’ve learned to ignore the distractions of a pile of dirty dishes, the book I didn’t finish reading last night, or starting dinner – most of the time. But not all! A separate area to work would be fantastic.


Josh: What new things are you learning?

Jes: In an effort to earn my MCSA in SQL Server, I’m hard at work on Administering Windows Server 2012. I have a background in Windows setup and administration, but there’s been a lot of new stuff that has emerged or matured in the last five years – particularly when it comes to doing tasks with PowerShell. So I’m spending time focusing on server roles, Active Directory, and PowerShell – it’s a ton of fun!


Josh: We all love SQL Saturday. Do you have a memorable moment from a particular event that stands out?

Jes: Gosh. There are so many. So many. I’m going to call out SQL Saturday Chicago 2011, which was the third I attended and the first I spoke at. My session was, “Make Your Voice Heard!”, a session on using LinkedIn, Twitter, and forums to increase your SQL Server community presence and network. I was in the room next to Brent Ozar, and his filled up. so I got some overflow. There were a few MVPs and well-known speakers that drifted over. But there were also a few people new to the community that had never blogged or spoken. Over the last four years, I’ve watched people from that session grow into active bloggers and speakers, and I keep up with them on a regular basis and I couldn’t be happier. That session meant so much to me.


Jes, thank you for taking some time out of your schedule to share with me, and the rest of the community. I do hope you find your ‘perfect desk’ soon, and GOOD LUCK ON THE MSCA!

p.s.  Jes also interviewed me.  You can read those responses here.