### A Pluto.jl notebook ### # v0.20.4 using Markdown using InteractiveUtils # ╔═╡ 828743bf-eae7-40d6-9d57-077be68ad521 using BenchmarkTools # ╔═╡ bea93f23-efea-45f4-8838-a12bae41101e begin using BioSequences # In the Kmers library, a K-mer is a parametric type that has a K parameter, which is just the length of the K-mer, and an N parameter which is the number of words that the K-mer uses. These are not types! struct Kmer{A <: Alphabet, K, N} <: BioSequence{A} data::NTuple{N, UInt} end end # ╔═╡ 89309964-1d1b-4ddc-ad3e-3133ea411564 md" ## Julia Types & Optimization ### Things we'll cover - Why typing matters - Union Types - Parametric Types - Bonus Funky Stuff (Singleton Traits) " # ╔═╡ db853800-fea3-11ef-1c7b-97635aa9381b md" ### Typing Matters ##### The ambiguity issue " # ╔═╡ 5bc23e29-e1b4-4ec5-9c39-1b0c564b099e md" Let's define 2 structs, but specify field types for only one of them (defaulting the other one to Any) " # ╔═╡ 5db975e0-230b-417b-8d6d-fc53c8ffb94f struct any_struct var1 var2 end # ╔═╡ 52e24b55-c223-4f1b-9a30-8e86168347e2 struct typed_struct var1::Int64 var2::Float32 end # ╔═╡ 23ebb3ca-71af-4687-bc49-1b198f0d433e md" Let's have a quick look under the hood when we instantiate the structs: " # ╔═╡ 4646d54d-4e26-4ef2-a2e3-78222593ea0a @code_llvm any_struct(1, 2.0) # ╔═╡ 74374ef3-904d-4c84-9d4a-3b9f23ad602e @code_llvm typed_struct(1, 2.0) # ╔═╡ 235e96fb-3356-43b1-8c38-370c9596d4b1 md"The Any version is a lot less efficient than the typed version. - It has heap allocations and is therefore subject to garbage collection (`@ijl_gc_pool_alloc`) whereas the typed version allocates on the stack (faster) and has no garbage collector interactions (this may not always be true for more complex structures, but still, more specific types allows the compiler to optimize away) - It \"stores\" values as a boxed values array (`[2 x {}*]`) meaning you have to follow pointers to get the fields' values. The typed version directly stores the values (`{ i64, float }`). In general, type ambiguity forces the compiler to jump through many hoops to make sure it can accomodate the ambiguity. **All of this adds extra overhead, which you should never pay unless you have absolutely no choice** (or optimization is not a concern). " # ╔═╡ bb17f4e0-4dec-42aa-b52c-b35cc0d07648 md" ### Union Types A cool feature of Julia is that it allows you to create new types that regroup multiple existing types together. " # ╔═╡ f3ab35ab-6581-4dc8-9336-595a0d2c8ff3 unsigned = Union{UInt8, UInt16, UInt32, UInt64} # ╔═╡ 39e23646-5250-4cfa-9a85-990a4ff57c63 UInt8 <: unsigned # ╔═╡ 7a6b4f63-3ad5-4597-a5d8-f08087470602 Int8 <: unsigned # ╔═╡ 9303b586-1a9a-4572-90a2-d43d39759582 unsigned <: Unsigned # ╔═╡ 0d3e4d05-a09d-4bf6-bcd8-7409e1dd216f md" This has many uses, such as allowing for cool overloading dispatch (like we did for Dabus in my last workshop), but is also very useful for resolving type instability. You can use `@code_warntype` to find such instabilities. " # ╔═╡ 02d4572f-ec9b-4e1e-b033-e9c63cd3975e begin struct Unstable x::Any # Note that unspecified type default to Any end function process_value(u::Unstable) return u.x + 1 end unstable_1 = Unstable(3) unstable_2 = Unstable(3.5) @code_warntype process_value(unstable_1) end # ╔═╡ 6536e72a-e9e0-46e7-a720-d35f5c2ff435 begin struct Stable x::Union{Int64, Float64} # Now with a Union of specific types end function process_value(s::Stable) return s.x + 1 end stable_1 = Stable(3) stable_2 = Stable(3.5) @code_warntype process_value(stable_1) end # ╔═╡ 264ad990-a527-4091-8526-155e03ab7e18 md" Type unstability is the reason why the generated llvm code we saw earlier becomes less efficient to accomodate ambiguity. Union allows you to allievate ambiguity while remaining generalist when you need your struct/function to work on different types. These ambiguities can cause serious slowdowns, have a look:" # ╔═╡ 48319f28-25ea-453c-b9c3-6db4b6bac7a3 md" In general, `@code_warntype` is a useful macro to make sure that your code will be compiled with as little ambiguity as possible. " # ╔═╡ cf149661-b082-40a9-8442-30605744520f md" ### Parametric Types Parametric Types can be used to fulfill a similar role to Union: Writing efficient yet type generalist code. The cool thing about them is that a Parametric Type, once its parameters are specified, becomes a unique type that can be differentiated from other instances that were created with different parameters. " # ╔═╡ 9aed4853-9ec3-430a-91f4-2c359819fb08 begin struct Parametric_Stable{T} x::T # T is the parameter of this parametric type end function process_value(ps::Parametric_Stable) return ps.x + 1 end p_stable_1 = Parametric_Stable(3) p_stable_2 = Parametric_Stable(3.5) @code_warntype process_value(p_stable_1) end # ╔═╡ c1d8ad17-e45d-4e80-8840-8efb8d95b2ba @btime(process_value($(unstable_1))) # ╔═╡ d0f1b272-1859-4694-bf49-fe1dd896e806 @btime(process_value($(stable_1))) # ╔═╡ e7caed18-32c4-47b3-ac74-e739abff2101 @btime(process_value($(p_stable_1))) # ╔═╡ 13c48240-a941-45f7-a695-93123f28906c md" Both of these are now their own specific types, allowing you to dispatch depending on their parameters, but they're also subtypes of the general Parametric_Stable type" # ╔═╡ 0ddf88b5-529b-4918-b945-c58660f6699c typeof(p_stable_1) # ╔═╡ d8cc6712-aa7b-429b-91c0-69bb9274ed0c typeof(p_stable_2) # ╔═╡ 8c992e8d-5755-44f5-b9ba-403713875313 typeof(p_stable_1) == typeof(p_stable_2) # ╔═╡ 29475eb9-7c21-4a53-944f-fc6b6c30b685 typeof(p_stable_1) <: Parametric_Stable && typeof(p_stable_2) <: Parametric_Stable # ╔═╡ acb842c6-4d1f-4aca-8f1a-981a19dc831c md" Which means you can write stuff like this: " # ╔═╡ 9d5ebbb7-faf9-4a93-9d96-a2d1b68b7331 function fancy_process(u::T) where {T <: Union{Unstable, Stable, Parametric_Stable}} return u.x + 1 end # ╔═╡ f8cd2034-b816-48d1-b923-761263422a16 begin @btime(fancy_process($(unstable_1))) @btime(fancy_process($(stable_1))) @btime(fancy_process($(p_stable_1))) end # ╔═╡ 7c63aba5-4d6b-4979-a4cd-58ce8ec47552 md" ### The where clause The where clause allows you to define on the fly a parametric type that encompasses all the subtypes of a given type (usually an abstract type). It's equivalent to manually defining the Union clause with the corresponding subtypes, it's just faster to write and easier to maintain. " # ╔═╡ 07e2397f-2d03-4e8c-8a2d-fcd7980f93ae begin function add_one_a(x::T) where {T <: Number} return x + 1 end function add_one_b(x::Union{Int, Float64}) return x + 1 end @btime add_one_a(3) @btime add_one_b(3) end # ╔═╡ 3700196b-46cb-477f-be70-4dd5983e1aef md"### Parametric Values Parameters of Parametric Types don't need to be Types themselves! You could use static values to create Parametric Types on the fly that are different from each other based on said parameter " # ╔═╡ 806aaaae-312e-4a3b-8879-81b0b6be1de2 Kmer{DNAAlphabet{2}, 31, 1} == Kmer{DNAAlphabet{2}, 30, 1} # ╔═╡ 0426b074-5e6a-4abc-9222-68f6e6dd0b9b Kmer{DNAAlphabet{2}, 31, 1} <: Kmer >: Kmer{DNAAlphabet{2}, 30, 1} # ╔═╡ f951b862-596d-40e8-b4b0-1735a0c428b6 Kmer{DNAAlphabet{2}, 31, 1}.parameters # ╔═╡ a23d7005-3159-405f-be4c-6e14d933e956 md"### Bonus: Singletons & Traits Singletons are field-less structs that can be used as trait types to do overloading dispatch that can be more flexibly modified. It's a more Functionally oriented approach that detaches the behaviour of the object from the object itself. Here's an example that exists in Base [(generator.jl)](https://github.com/JuliaLang/julia/blob/3f4eda64feb0267ae71f603e9ce80fc05966cdc5/base/generator.jl#L60). This exemple is taken directly from the Holy Trait Pattern book that influenced a lot of Julia's paradygm. I skimmed it excerpt to better understand." # ╔═╡ 1d5fae60-39cc-4ede-9fc8-d559d1c32330 begin # These are the are the traits any given iterator can have abstract type IteratorSize end struct SizeUnknown <: IteratorSize end struct HasLength <: IteratorSize end struct HasShape{N} <: IteratorSize end struct IsInfinite <: IteratorSize end # These convert your iterable into a trait that describes it IteratorSize(x) = IteratorSize(typeof(x)) IteratorSize(::Type) = HasLength() # HasLength is the default IteratorSize(::Type{<:AbstractArray{<:Any,N}}) where {N} = HasShape{N}() IteratorSize(::Type{Base.Generator{I,F}}) where {I,F} = IteratorSize(I) IteratorSize(::Type{Any}) = SizeUnknown() # Doc for these traits reads: """ Given the type of an iterator, return one of the following values: * `SizeUnknown()` if the length (number of elements) cannot be determined in advance. * `HasLength()` if there is a fixed, finite length. * `HasShape{N}()` if there is a known length plus a notion of multidimensional shape (as for an array). In this case `N` should give the number of dimensions, and the [`axes`](@ref) function is valid for the iterator. * `IsInfinite()` if the iterator yields values forever. """ # Then those traits can be used to dispatch to the correct functions and implement some form of "type arithmetics", for exemple when you want to use the zip function to Iterate over two iterables zip_iteratorsize(::HasLength, ::IsInfinite) = HasLength() zip_iteratorsize(::HasShape, ::IsInfinite) = HasLength() zip_iteratorsize(::IsInfinite, ::IsInfinite) = IsInfinite() zip_iteratorsize(a::IsInfinite, b) = zip_iteratorsize(b,a) end # ╔═╡ 00000000-0000-0000-0000-000000000001 PLUTO_PROJECT_TOML_CONTENTS = """ [deps] BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" [compat] BenchmarkTools = "~1.6.0" BioSequences = "~3.4.1" """ # ╔═╡ 00000000-0000-0000-0000-000000000002 PLUTO_MANIFEST_TOML_CONTENTS = """ # This file is machine-generated - editing it directly is not advised julia_version = "1.10.5" manifest_format = "2.0" project_hash = "a44b2a6f5e61f902a989ef8f2085e1656bae2c32" [[deps.Artifacts]] uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33" [[deps.BenchmarkTools]] deps = ["Compat", "JSON", "Logging", "Printf", "Profile", "Statistics", "UUIDs"] git-tree-sha1 = "e38fbc49a620f5d0b660d7f543db1009fe0f8336" uuid = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" version = "1.6.0" [[deps.BioSequences]] deps = ["BioSymbols", "PrecompileTools", "Random", "Twiddle"] git-tree-sha1 = "b602272be9915fbf41923d9fa67f16309bdf25d6" uuid = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59" version = "3.4.1" [[deps.BioSymbols]] deps = ["PrecompileTools"] git-tree-sha1 = "e32a61f028b823a172c75e26865637249bb30dff" uuid = "3c28c6f8-a34d-59c4-9654-267d177fcfa9" version = "5.1.3" [[deps.Compat]] deps = ["TOML", "UUIDs"] git-tree-sha1 = "8ae8d32e09f0dcf42a36b90d4e17f5dd2e4c4215" uuid = "34da2185-b29b-5c13-b0c7-acf172513d20" version = "4.16.0" weakdeps = ["Dates", "LinearAlgebra"] [deps.Compat.extensions] CompatLinearAlgebraExt = "LinearAlgebra" [[deps.CompilerSupportLibraries_jll]] deps = ["Artifacts", "Libdl"] uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae" version = "1.1.1+0" [[deps.Dates]] deps = ["Printf"] uuid = "ade2ca70-3891-5945-98fb-dc099432e06a" [[deps.JSON]] deps = ["Dates", "Mmap", "Parsers", "Unicode"] git-tree-sha1 = "31e996f0a15c7b280ba9f76636b3ff9e2ae58c9a" uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" version = "0.21.4" [[deps.Libdl]] uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb" [[deps.LinearAlgebra]] deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"] uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" [[deps.Logging]] uuid = "56ddb016-857b-54e1-b83d-db4d58db5568" [[deps.Mmap]] uuid = "a63ad114-7e13-5084-954f-fe012c677804" [[deps.OpenBLAS_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"] uuid = "4536629a-c528-5b80-bd46-f80d51c5b363" version = "0.3.23+4" [[deps.Parsers]] deps = ["Dates", "PrecompileTools", "UUIDs"] git-tree-sha1 = "8489905bcdbcfac64d1daa51ca07c0d8f0283821" uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0" version = "2.8.1" [[deps.PrecompileTools]] deps = ["Preferences"] git-tree-sha1 = "5aa36f7049a63a1528fe8f7c3f2113413ffd4e1f" uuid = "aea7be01-6a6a-4083-8856-8a6e6704d82a" version = "1.2.1" [[deps.Preferences]] deps = ["TOML"] git-tree-sha1 = "9306f6085165d270f7e3db02af26a400d580f5c6" uuid = "21216c6a-2e73-6563-6e65-726566657250" version = "1.4.3" [[deps.Printf]] deps = ["Unicode"] uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7" [[deps.Profile]] deps = ["Printf"] uuid = "9abbd945-dff8-562f-b5e8-e1ebf5ef1b79" [[deps.Random]] deps = ["SHA"] uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" [[deps.SHA]] uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce" version = "0.7.0" [[deps.Serialization]] uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b" [[deps.SparseArrays]] deps = ["Libdl", "LinearAlgebra", "Random", "Serialization", "SuiteSparse_jll"] uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" version = "1.10.0" [[deps.Statistics]] deps = ["LinearAlgebra", "SparseArrays"] uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" version = "1.10.0" [[deps.SuiteSparse_jll]] deps = ["Artifacts", "Libdl", "libblastrampoline_jll"] uuid = "bea87d4a-7f5b-5778-9afe-8cc45184846c" version = "7.2.1+1" [[deps.TOML]] deps = ["Dates"] uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76" version = "1.0.3" [[deps.Twiddle]] git-tree-sha1 = "29509c4862bfb5da9e76eb6937125ab93986270a" uuid = "7200193e-83a8-5a55-b20d-5d36d44a0795" version = "1.1.2" [[deps.UUIDs]] deps = ["Random", "SHA"] uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4" [[deps.Unicode]] uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5" [[deps.libblastrampoline_jll]] deps = ["Artifacts", "Libdl"] uuid = "8e850b90-86db-534c-a0d3-1478176c7d93" version = "5.11.0+0" """ # ╔═╡ Cell order: # ╟─89309964-1d1b-4ddc-ad3e-3133ea411564 # ╟─db853800-fea3-11ef-1c7b-97635aa9381b # ╟─5bc23e29-e1b4-4ec5-9c39-1b0c564b099e # ╠═5db975e0-230b-417b-8d6d-fc53c8ffb94f # ╠═52e24b55-c223-4f1b-9a30-8e86168347e2 # ╟─23ebb3ca-71af-4687-bc49-1b198f0d433e # ╠═4646d54d-4e26-4ef2-a2e3-78222593ea0a # ╠═74374ef3-904d-4c84-9d4a-3b9f23ad602e # ╟─235e96fb-3356-43b1-8c38-370c9596d4b1 # ╟─bb17f4e0-4dec-42aa-b52c-b35cc0d07648 # ╠═f3ab35ab-6581-4dc8-9336-595a0d2c8ff3 # ╠═39e23646-5250-4cfa-9a85-990a4ff57c63 # ╠═7a6b4f63-3ad5-4597-a5d8-f08087470602 # ╠═9303b586-1a9a-4572-90a2-d43d39759582 # ╟─0d3e4d05-a09d-4bf6-bcd8-7409e1dd216f # ╠═02d4572f-ec9b-4e1e-b033-e9c63cd3975e # ╠═6536e72a-e9e0-46e7-a720-d35f5c2ff435 # ╟─264ad990-a527-4091-8526-155e03ab7e18 # ╠═828743bf-eae7-40d6-9d57-077be68ad521 # ╠═c1d8ad17-e45d-4e80-8840-8efb8d95b2ba # ╠═d0f1b272-1859-4694-bf49-fe1dd896e806 # ╟─48319f28-25ea-453c-b9c3-6db4b6bac7a3 # ╟─cf149661-b082-40a9-8442-30605744520f # ╠═9aed4853-9ec3-430a-91f4-2c359819fb08 # ╠═e7caed18-32c4-47b3-ac74-e739abff2101 # ╟─13c48240-a941-45f7-a695-93123f28906c # ╠═0ddf88b5-529b-4918-b945-c58660f6699c # ╠═d8cc6712-aa7b-429b-91c0-69bb9274ed0c # ╠═8c992e8d-5755-44f5-b9ba-403713875313 # ╠═29475eb9-7c21-4a53-944f-fc6b6c30b685 # ╟─acb842c6-4d1f-4aca-8f1a-981a19dc831c # ╠═9d5ebbb7-faf9-4a93-9d96-a2d1b68b7331 # ╠═f8cd2034-b816-48d1-b923-761263422a16 # ╟─7c63aba5-4d6b-4979-a4cd-58ce8ec47552 # ╠═07e2397f-2d03-4e8c-8a2d-fcd7980f93ae # ╟─3700196b-46cb-477f-be70-4dd5983e1aef # ╠═bea93f23-efea-45f4-8838-a12bae41101e # ╠═806aaaae-312e-4a3b-8879-81b0b6be1de2 # ╠═0426b074-5e6a-4abc-9222-68f6e6dd0b9b # ╠═f951b862-596d-40e8-b4b0-1735a0c428b6 # ╟─a23d7005-3159-405f-be4c-6e14d933e956 # ╠═1d5fae60-39cc-4ede-9fc8-d559d1c32330 # ╟─00000000-0000-0000-0000-000000000001 # ╟─00000000-0000-0000-0000-000000000002