HomeArticlesCocoaheads KRK #29 Swift Intermediate Language – Bartosz Polaczyk
Cocoaheads KRK #29 Swift Intermediate Language – Bartosz Polaczyk
August 14, 2019
Hi! I’m Bartosz Polaczyk. I also work at
Grand Parade. Welcome to a presentation titled SIL, All you need to know about
Swift Intermediate Language. I will talk today about some low level stuff that
happens in the swift compilation, so I am aware that no many of you have contact
with it on a daily basis. That’s why I think it will be better to first you
give introduction into compilers and swift compiler, so we’ll know how, where
exactly Swift SIL is located in a compilation workflow. Here is an agenda. So as I said I will introduce you into compilers, we will take a look
how does it work on swift sample compiler. We then will go into main part
SIL and at the end of course summary. So starting with compilers. I suppose that
all of you understand that any compiler itself it’s not a simple
project and as any other sophisticated application it has to be well organized
with a clear separation of concern. That’s why compilers consist of three
main parts usually: front-end, middle-end and back-end. That’s quite easy to
remember. However for some compilers both front and a middle-end are just linked
together, called front-end. So this is a case with Swift compiler, where for
brevity we just distinguishing front-end which is responsible for front and
middle-end and a back-end. What are responsible for these parts?
Starting from front-end, some languages have preprocessor (swift doesn’t).
Preprocessor is responsible for editing your source file somehow
and then, in the first compilation process/step goes
tokenizing. Compiler takes your source file, splits it
into simple small chunks of your code. So, some sample of tokens could be:
opening curly braces, quote character or keywords like: if, else. So generally this
is quite simple step. The next one, the next step is called Parsing and compiler
takes the list of your tokens from a previous step and tries to create an AST-
Abstract Syntax Tree. Abstract Syntax tree it’s a 1-1 representation of
your code, that you wrote for instance in swift, but in a tree structure which is
then easier to work in the further processing further steps. In this
particular point compiler is 100% sure that your code is syntactically correct.
Once it knows it, it can go into semantic analysis, mostly in semantic analysis it
performs some checks that all your types match each other. After we have a
semantically correct representation we can go into second block (middle-end)
which is responsible for analysis and optimizations. Optimizations could be
like: “dead code elimination” or “tail recursion” for instance. What I wanted to
note here that after the middle-end, after end of the middle-end, compiler is
completely unaware of what kind of architecture it is compiling to. So no
matter if you’re compiling for x86 of arm64 these are all the same steps and
only back-end is responsible for creating machine level optimisation for
a specific architecture, for a specific processor. And of course, at the end,
generate code instructions that can be run directly on the processor. This is
the general overview, broad overview of any compiler. Now let’s move to the Swift
and let’s start from the end. Not from start but from the end, from back-end.
some of you may already know that swift reuses the existing back-end compiler,
called LLVM, and this is exactly same back-end compiler as Objective-c
uses. What is LLVM? As I said it’s a back-end compiler, it outputs unlinked
machine code object file “.o” at the end and it takes LLVM IR in the
front, LLVM Intermediate Representation. What is it? It is defined
as an assembly code of some theoretical processor. There is no processor on the
World that can understand that, it is created only to feed LLVM back-end
compiler. which create machine code as an output. in general any back-end compiler
is actually quite complicated, because it operates on the low level abstraction
and that’s why many compilers try to reuse existing back-end compilers
(back-end parts) rather than create new one. As I said swift reuses the same
LLVM as original Objective-C compiler, which is clang. Clang
can compile C, Objective C and C++ languages. But there are many more
languages that use LLVM under the hood, to mention: Rust, C# or even Kotlin.
If you’re not aware, despite the most popular Kotlin compiler that uses JVM,
there is also another compiler called Kotlin/Native which uses LLVM to
create machine code. Those Kotlin application do not rely on JVM
anymore. Those applications can be run directly on the
processor. Ok, so we know about backend. Now, let’s go to a swift compiler and take a
look every single step from the beginning. As I said, we have Swift file
and we are going to do the first step. Sctually the first step was tokenizing,
but it was really simple – just splitting into small tokens your code.
The first step is Parse. As I said it creates Abstract Syntax Tree,
one-to-one representation of your swift code.
What’s interesting with swift, we can very easily observe the output after
each step of the compilation. So how to do that? how to get the output of it? The
easiest way is to open Xcode, go to Report Navigator, find your compilation
Swift file and then extend the view using this button. You will see
terminal command that Xcode uses under the hood. By default, Xcode in DEBUG build
builts for every single Swift file you have very similar command in the
terminal – just because it’s using “single file” mode by default in DEBUG mode. As
you see, there are many different arguments that we will skip at the
moment. But if we add an extra argument -dump-parse, we will be able to see on the
console output what is the AST after the first parsing step. So let’s have a look! Just for reference, I presented you the Swift file that is used for compilation.
You don’t have to at the moment try to read it.
We will come back to it, but I believe that if I show you AST (zoom) you will be
easily able to understand which part, which nodes in the tree correspond to
which parts in a swift file. this is one-to-one representation, self-explanatory.
What I wanted to note here, that we’ve got variable value and since I as a swift
developer didn’t provide actual type of it, let compile infer it,
it’s still in ast, after the first step, it’s still unknown.
It doesn’t know that it would be String. So that was the first part, first step. We
used -dump-parse. We can take a look for the next step.
It was semantic analysis, and of course we can serialize an output. Let me
demonstrate, for the same file, how does is typed-AST look like.
This is just a part of an output but if we take a look into a variable now it’s
already it’s already String there. So after this step we we are aware what
actually types the compiler should use. That was the second step and let’s go
further. The last step of the swift front-end is a SILGenerator and as the
name suggests, it creates the SIL language. Let’s take a look how it looks. It looks like that. So our representation gets a little fuzzy. It’s
some kind of, looks like a mix between some assembly cow and maybe some swift.
We will move back into SIL in a minute but at the moment stage ____.
So that was front-end, output that we’ve seen, called Raw SIL, is
passing into the middle-end. there is increasing lines of code,
because as we are going deeper and deeper in the compilation, our concise
swift instructions are the split into smaller instructions. In the middle-end
we have SILOptimizations, so the main part of little a middle-end. Output of it is
relatively similar to the previous one, that’s why I will not
show you the slides but we’ll take a look into
LLVM IR, so the output from the next step. It is LLVM-IR. So as we are going
lower and lower in abstraction, it looks more esoteric. Last block, back-end, we received
LLVM-IR. Back-end is responsible for creation some optimization for
specific architecture and output machine code. Obviously we can
also take a look into assembly code, into string file. This is something that
most of you already seen in the Xcode: just assembly code. So that was entire
process for Swift compilation. Let’s recap. We had a front-end for parsing and
ensuring all the syntax and semantic correctness. In the middle-end we
had some optimisations and depending on which architecture we are using,
back-end (LLVM) creates machine code for specific arm64 or x86 architecture. Now
we’ll move into Swift Intermediate Language, so it is located in the middle-end and
originally it was developed to make make optimizations easier just for
Swit core team to create optimizations over here. So what we will now
take a look is located it’s used in the middle-and, in the SILOpt step. Swift Intermediate Language as name suggests,
it’s a language and it is a fully-fledged language so I could technically
introduce you with entire syntax of it. But rather than this, I think I would
just give you the most important properties and just let you know what
kind of information you can find there. For that, we will use the samples that
I’ve already presented you. So let me give you a moment to read and
understand what’s actually is going here. It’s not so complicated.
I suppose all of you already know that this is just truncating your long title if it’s longer
than 10 characters with some ellipsis and if it’s shorter than just returns the
same string. Even it’s so simple it includes one branch so “if” here
and also something like syntactic sugar where we adding this ellipses on the
end of a value variable. Let’s just review SIL, it looks like that, so
it’s quite verbose. First characteristic. So every single line in
SIL consists of only one simpler instructions and these instructions are
just indexed with a number starting from zero so %0, %1 etc. into infinity.
There is no variables anymore. Let me demonstrate
here – every single line we got some index. SIL is strongly types, so you will see
exactly, almost the same types that you are using in swift. So you will take a look
and see String type and some pointers to String… Third one, branching. That’s quite
interesting. What do I mean my branching by the way? So branching is a
breaking of your flow of instructions that we are executing,
depending on some condition. So the simplest branching can be caused by “if”.
Whether we have true or false we may perform with different set of
instructions. How is it represented in SIL? So in SIL this function
is split into several continuous building blocks. it means that
it is impossible to break depending on some condition inside of
building block, only at the end of it. I know it’s a sounds a little weird but let’s the with the example. We have here a building block zero (bb0) at the top and we have to
execute all the instructions all the way down until the end of a building block
where depending on index 17 value, we are branching into bb1 or bb2.
ARC. SIL includes retains and releases, so if you are curious where actually your retain
counts are increased and decreased you can take a look. Rich in comments. You may already
seen that there are a lot of comments in a SIL representation of code.
Those comments are just skipped by a compiler, but it is helpful for us to
understand what is actually going there, we can use it for some analysis. So what
kind of comments we can expect? Demangled function names, building blocks predecessors and many more. Iut I wanted to concentrate on building block predecessors. We have here the building block no 1 and it exposes that
the only one predecessor is building block no 0. It means that we can
reach that point only from the building block no 0. Where it can be beneficial?
Now let’s assume that you wanted to create a tool that counts rather
than line code coverage that Xcode creates, branch path coverage. You can
just take a SIL, try to analyse how many potential branch your code
may run and then try to estimate how many of potential branches your unit test
should cover. So for instance it could be useful for creating such a tool. we can
also see that bb2 predecessors are bb1 or bb0. Last, but
not least characteristic I wanna present you: SIL can be used
for code distribution. What does it mean? It means that once you received SIL
file in a text file, we can save it to a disk, send it over the network and then
start creating machine code directly from that point, so skipping the previous
steps. This is a super powerful. Let’s imagine that you are creating some
server-side application for many different Linux distribution we
have to rebuild, rebuild all the same code. You don’t have to do it from
scratch, from swift but only from SIL representation. That is a first benefit that we can find SIL is useful.
And a second one – that technically we can add some meta programming in SIL.
Although, I’m a bit skeptical about adding something there in production
code, but it’s possible to add some extra variables, some text etc.
Let’s summarise the characteristics that we’ve learned about. SIL actually
takes your swift concise instructions and split them into smaller, simpler
instructions. It is strongly typed so still keeps the track on all the types.
Building blocks are split into continuous building blocks. Include some ARC retains/releases. SIL comments
are very helpful if you wanted to create some tool out of it. At the end, it should
work to compile from SIL. I said should, because at the moment there are bugs
filled to a compiler even in a trunk branch. But hopefully, in the future, swift team
will address that. So to summarise all potential benefits that we can get out
of SIL. Originally SIL was was created just to make Swift compiler team
life easier, just to let them run the optimizations. Unless you
actually want to contribute to that core part, this is out of our scope. But
if we are creating the compilation for different architectures, thanks that
SIL can be used for code distribution, we can speed up our
rebuilding time. Also is helpful for some testing tool – I mentioned about path
code coverage and also it could be possible to create some mutation test
thanks to SIL. So what are mutation tests, by the way. Just to remind you, mutation tests in general are there evaluating whether your unit tests are
correct, that they are catching that the potential bugs in implementation
will become red in test. So how does it work? Assuming that we’ve got implementation
already. Somehow it works, its defined in some way, we’ve got a set of tests
that are passing, so green tests. That’s a starting point and then a tool takes your
implementation and tries to mutate somehow. For instance, replace the plus into minus, the simplest mutation. This is called “creating a mutant” and once we
change the behavior of our implementation code, we are expecting that at least one
test will fail. In case all the tests are still green it means that we had a
backdoor, we had a hole in our tests, we are not verifying something. Us “talk is
cheap”, let me just demonstrate the sneak peek of the tool that I’m currently
working on, and of course it uses SIL under the hood. Let’s go to Xcode and
here is it’s a very simple class on the left so the implementation on the left, once
more again. and tests on a right. Let’s follow these steps. We have a class Welcome that exposes one function “welcome” that takes the guest
name String saves last recently welcomed guest into a variable and it informs its delegate about new guest who entered. OK. The test also verifies one thing so
we are instantiating, linking delegate into a delegate mock here and we are in “Act” we are calling the “welcome” function with “Guest1” argument. Then we are verifying that
the value is stored there. Yes, it is ok test – we’ve got already 100% code coverage, but
let’s try to run mutation test for that. Let me run it. This is implemented as
a Fastlane plug-in written in Ruby. What it does, first it rebuilds
entire project, it takes a while and in the next step, it tries to find potential mutation
that it can introduce. And we see that we’ve got 4 mutations over there.
Two of them are relevant at the moment, referring to Welcome class that I
showed you. First one says, that “I can replace the body of the welcome function
into empty function”. OK, another one “I can do exactly the same with infoAboutGuest”. Then it performs all the unit tests according to implementation that
the tool has changed. So we see that the first mutant has been killed – correct – but
the second one survived. What happened? I know that the
mutation test replaced this function into empty one, so did something like that.
Oh yeah, I completely forgot to verify my delegate mock, what was actually
appending here a name of a string. OK, so I have to write one extra test for that –
let’s do it. We are checking that delegate is informed. Act
will be exactly the same so we are still welcoming. In assert
we are expecting that here the array of all welcomed guest is equal to an array with one element. OK, welcome mock equals to “Guest1”. I think is
everything and I didn’t make any typo. Let’s run test again once it builds, I will
show you here the simulator. Take a look what’s happening there
after the mutation applied. Yeah, found four mutations to apply. And here, on the
simulator, we are seeing that it very quickly introduces mutation and then run
unit tests for that. So it doesn’t start from the beginning it doesn’t start
rebuilding from swift, it starts directed from SIL. Great,
we killed another mutant here. So we covered with an extra test and the
quality of our test increases from 25% to 50. Yeah, of course we’ve got also some
uncovered test for AppDelegate, but this is something that I expect. I’m not
testing AppDelegate, so I don’t care about those mutants. So as you see we had
100% code coverage but our tests were still missing something
and I could be able, thanks the mutation tests, find it and track it where is the hole
in unit tests. Let’s go back to slides, to summarize. So we’ve learned at the beginning about
Swift – that it consists of front-end for specifics swift language. There is a
back-end which reuses the same LLVM back-end as Objective-C. If you are curious
what are the steps in the swift compilation, it creates a lot of option
to serialize it into the output, human readable output somehow. We learned
also about SIL – so we know that it is a simplified swift representation of the
code. It’s completely architecture independent, because
it sits in the middle-end. Originally it was created for optimisation done by a
core team – swift compiler core team, but also it can be useful if you want
to create some tool, third-party tool. So if you have any idea to help, our life,
our Swift developers life make easier, now you can consider SIL. Thank
you, that’s everything that I have today. Do we have questions? Thanks, great talk.
tThank you very much. So in this SIL file there is an intermediate step in
computation, but is there any information about with swift runtime in this sil file? Or it
is added later? What do you mean by swift runtime? Sweet have some runtime things like
those retain/release counters that is used under the hood, but if you write
code in Swift, it is not visible. So it- does SIL contains those runtime things or is added later?
>There are two representation of a SIL. First one is called raw sil, before
optimization, but after application of those ARC like retaining releases, things
like Equitable, Hashable. Those things are added in SIL. But also after
optimization there is a called canonical SIL it also has some extra things after
some dead code elimination and so on. So this part also for includes all those
magical things that happen under the hood.
>Okay thanks. Hello, thanks. So I have a question that – What about if you have multiple tests, like a lot of
tests. Are we planning to make some optimization to make these mutation for
all of the tests in the same time? I think it is impossible, because it uses the main thread. If you wanted to run several tests on a one simulator, it will break.
Because it uses global variables. Or maybe you are asking if I am trying to use
different simulator for increasing… No, no. I’m asking of a dot if you have a lot of tests – we can find for the 10 tests, let’s assume 20 of mutation. So you’ll be,
you’ll be running one step, the next mutation, ok?
>Yeah, so for every mutation I will run one test.>OK. Thank you. That’s a good point because mutation tests can be
time-consuming. Thanks for great talk. Question is: the SIL written in C, right?>No.>In swift?
>Reader, parser and all the compilation is written in C, or C++, actually. SIL is actaully a representation. You can think…>No, no, no. I am asking about implementation itself. The representation doesn’t matter for me. OK, we can benefit of it, but you have to know C++, right, tf you
want to deep into it. But you were writing like, as far as I saw, some plugin for fastlane and you were using obviously our favourite language, Ruby. And what do you
use for it? Like, do you like handle it only output itself and parsing it by
yourself, or you are using like some external to
read, I don’t know in Swift or Ruby? So the question generally is: how do I read,>Exactly, how do you understand using the other language probably in Ruby, so what SIL outputing.
>So SIL, is a text file that I presented you. I had to actually perform almost the
same steps I presented in slides. So I had to tokenizing, semantically analysis, syntax and so on. However, I didn’t make it 100 % parsing of it.
I was just taking the most important key parts of the SIL representation. Just
to take what I actually needed. As far as I know there is no single SIL parser
on any language other than C++ that you mentioned. That’s why I had to
do it on my own.>OK, so it’s just a simplified version…>Exactly, exactly.
I think nobody can reuse it.>Maybe that’s your niche? You can create something?? In Swift??
>In Swift?! OK!
>OK, Thank you! SIL is a text file, which means that I can
write it manually. Is there any reason or use-case where it makes sense to write? Oh, I don’t believe so. It would be very complicated to write it. Exactly the same you can actually do by writing LLVM IR. You can also write assembly. This is kind of same story.
>I’ve written assembly>Oh, really. You may find SIL very nice language. Thanks! More questions? I have some questions: basically the SIL is our bytecode, JVM bytecode. So we have
something like bytecode weaving, so we can change the bytecode for a problem. Is it the same?
>Yes, as far as I understand what does it mean.>Because in Java we have to write a lot of boilerplate. Now we use Kotlin, but I am talking about Java ages. We have two ways to generate code. So we can use annotation processing, so we can preprocess something. Add more to our code, but still this gonna be java, then compiler. Then compiler complies it to the bytecode. And we have this bytecode waiving. And people, based on bytecode, they do a lot of stuff. And now it is said (at least they say that): If you want to mess with the bytecode, probably there is something wrong.>I agree, as i said. It’s a dangerous stuff. Like pretty powerful, you can change all these things. For mutations, I think it’s an aspect of
the programming of writing test. So you are actually use it to change it, to get some benefit but
imagine a fact that and some guy can to some wrong stuff with SIL. To try
to implement some … analytics for example, post-process analytics. So what do you think about it?>That’s too dangerous Yeah, I’m totally against meta programming, especially in SIL.>Yes, so because I know
that you’re a pretty fan of testing, that’s why I think you research the fact
of SIL and probably you want to build this library for mutation tests. And it’s
an awesome, but do you have any other ideas how to use it?>Any other than for testing? I wouldn’t recommend it to add it
to production code to modify it. Yeah. Generally in the case I did something
wrong. what can be an output of it? Your test, your mutation tests will just say that you’ve got a higher
or lower test quality. Nothing wrong happend. In case you’ve got some
error in the production code, that’s a terrible experience. So yeah, tests in maximum what you can get.
>This is the perfect use case actually. Because I was wondering what is the use
case for this. Last thing. We have a long library that using bytecode weaving for example make your classes immutable. So this is
something like that. So this is really strange to add features to a language because of the “SIL weaving”, or something. Thanks! Presentation great, I love it. I am assuming you would like to release such library. What’s the release date? Unfortunately I am using the swift 4.2
or this is called still the master branch because all of already existing
released swift version have a bug that I unfortunately or fortunately was able to
fix on a swift compiler repository but every time I check new version from a
trunk, it’s still not ready for a code distribution. That’s why I said that in
theory or in the future, it will be possible to create out of a SIL, code
that can be run on the machine. At the moment we still have to wait, at least
for swift 4.2, they are doing a lot of stuff regarding memory ownership that’s
why all the time I found some versions, trunk releases are more or
less able to rebuild it from from SIL. That’s why I cannot assure you that on swift
4.2 will work, maybe 5.0 it is rational to say “maybe”.>Thanks. So with
every version of swift we do have a new version of SIL?>No. This is language, the
definition of SIL is quite well documented but guys are still adding
some extra stuff that for instance they wanted to keep track of who owns the
object and that’s why compilation crashes, whenever you try
to build from SIL into a machine code. so SIL will be the same but you will not
be able to actually create mutations. Okay, so if you will create your library
once, the SIL will never change? Like you just mutating the SIL file, right?
>I cannot answer that, I have no idea if they’re planning to change SIL
in the future.>Because for every version of Java the bytecode is
different. They are adding ,changing stuff okay. Thanks!. The last one:
LLVM – this is a really good stuff. That’s why the Kotlin/Native can actually work
on the iOS. So do you think that the Kotlin will be, maybe not the main language, but
one of the languages that can be used for iOS apps?
>I have no idea?
>You have no idea? Do you have? ??? Is there any difference
between SIL generated on Linux with that’s generated on mac?>No, this is still
exactly the same SIL. Unless you’re using some clang importer, because you’re
importing stuff from Objective-C for instance. If you are creating a pure
swift code it should be, it will be exactly the same. That’s why you can take
SIL created from Mac and then compile it for Linux.>OK, thanks. Ok. that’s it!
So give the big applause for Bartosz.