cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Zircoz
by New Contributor II
  • 11619 Views
  • 2 replies
  • 6 kudos

Resolved! Can we access the variables created in Python in Scala's code or notebook ?

If I have a dict created in python on a Scala notebook (using magic word ofcourse):%python d1 = {1: "a", 2:"b", 3:"c"}Can I access this d1 in Scala ?I tried the following and it returns d1 not found:%scala println(d1)

  • 11619 Views
  • 2 replies
  • 6 kudos
Latest Reply
cpm1
New Contributor II
  • 6 kudos

Martin is correct. We could only access the external files and objects. In most of our cases, we just use temporary views to pass data between R & Python.https://docs.databricks.com/notebooks/notebooks-use.html#mix-languages

  • 6 kudos
1 More Replies
Anonymous
by Not applicable
  • 2080 Views
  • 1 replies
  • 2 kudos

Are there any costs or quotas associated with the Databricks managed Hive metastore?

When using the default hive metastore that is managed within the Databricks control plane are there any associated costs? I.e. if I switched to an external metastore would I expect to see any reduction in my Databricks cost (ignoring total costs).Do ...

  • 2080 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 2 kudos

There are no costs associated by using the Databricks managed Hive metastore directly. Databricks pricing is on a compute consumption and not on data storage or access. The only real cost would be the compute used to access the data. I would not expe...

  • 2 kudos
Techmate
by New Contributor
  • 957 Views
  • 1 replies
  • 0 kudos

Populating a array of date tuples Scala

Hi Friends i am trying to pass a list of date ranges needs to be in the below format. val predicates =Array(“2021-05-16” → “2021-05-17”,“2021-05-18” → “2021-05-19”,“2021-05-20” → “2021-05-21”) I am then using map to create a range of conditions that...

  • 957 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

So basically this can be done by generating 2 lists which are then zipped.One list contains the first dates of the tuples, so these are in your case 2 days apart.The other list is the 2nd dates of the tuples, also 2 days apart.Now we need a function ...

  • 0 kudos
User16788316474
by New Contributor II
  • 898 Views
  • 1 replies
  • 1 kudos
  • 898 Views
  • 1 replies
  • 1 kudos
Latest Reply
gbrueckl
Contributor II
  • 1 kudos

I think this was added Databricks Runtime 8.2https://docs.databricks.com/release-notes/runtime/8.2.html

  • 1 kudos
alphaRomeo
by New Contributor
  • 2372 Views
  • 2 replies
  • 0 kudos

Resolved! DataBricks with MySQL data source?

I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...

  • 2372 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 0 kudos

There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...

  • 0 kudos
1 More Replies
EvandroLippert_
by New Contributor
  • 1385 Views
  • 2 replies
  • 0 kudos

Conflict with bitbucket and github credentials

I'm migrating my files from Bitbucket to Github, but every time that I need to clone something from bitbucket and send it to GitHub, I need to create a new token to integrate the tools. It seems that when you save a Github credential, it overrides t...

  • 1385 Views
  • 2 replies
  • 0 kudos
Latest Reply
alexott
Valued Contributor II
  • 0 kudos

Cross-posting my answer from StackOverflow:Unfortunately right now it works only with a single Git provider. It looks like that you're linking individual notebooks into Git repository. You can simplify things by cloning the Bitbucket repository(-ies)...

  • 0 kudos
1 More Replies
irfanaziz
by Contributor II
  • 1420 Views
  • 3 replies
  • 3 kudos

Does anyone know why the optimize does not complete?

I feel there is some issue with a few partitions of the delta file. The optimize runs fine and completes within few minutes for other partitions but for this particular partition the optimize keeps running forever. OPTIMIZE delta.`/mnt/prod-abc/Ini...

  • 1420 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@nafri A​ - Thank you for letting us know.

  • 3 kudos
2 More Replies
max651
by New Contributor
  • 992 Views
  • 1 replies
  • 0 kudos

How to create circles and find diameter in point cloud?

Good day, Everybody. I have a task I should create many circles in a point cloud (my file can be .csv) and calculate their diameter in the area 100m x 100 m. I have the coordinates of the starting point. The circles should be created at the height ...

0693f000007OoQoAAK
  • 992 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @ max651 ! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the community have an answer to your question first. Or else I will follow up shortly with a response.

  • 0 kudos
User16137833804
by New Contributor III
  • 2737 Views
  • 3 replies
  • 1 kudos
  • 2737 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sebastian
Contributor
  • 1 kudos

the best solution is to store the .whl locally and do a pip install of the local whl while server boots up. this will freeze the library version. if you install from the pip it might impact your production work.

  • 1 kudos
2 More Replies
User16856693631
by New Contributor II
  • 1271 Views
  • 2 replies
  • 0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

  • 1271 Views
  • 2 replies
  • 0 kudos
Latest Reply
ManishPatil
New Contributor II
  • 0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

  • 0 kudos
1 More Replies
BorislavBlagoev
by Valued Contributor III
  • 2871 Views
  • 3 replies
  • 1 kudos

DBUtils cannot find widgets [Windows 10]

I use databricks connect to connect PyCharm with databricks cluster remotely but when I try to get dbutils.widget throw an error. cluster conf: spark.databricks.service.server.enabled true spark.databricks.hive.metastore.glueCatalog.enabled true ...

  • 2871 Views
  • 3 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

This is normal behavior. databricks-connect does not support the whole dbutils class.https://docs.databricks.com/dev-tools/databricks-connect.html#access-dbutilsWidgets are not on the list.

  • 1 kudos
2 More Replies
bciampa
by New Contributor II
  • 18902 Views
  • 2 replies
  • 1 kudos

Unable to infer schema for Parquet at

I have this code in a notebook:val streamingDataFrame = incomingStream.selectExpr("cast (body as string) AS Content") .withColumn("Sentiment", toSentiment($"Content")) import org.apache.spark.sql.streaming.Trigger.ProcessingTime val result = stre...

  • 18902 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

seems like an invalid parquet file. my guess is the incoming data has mixed types (for the same column) or a different/invalid structure.

  • 1 kudos
1 More Replies
cconnell
by Contributor II
  • 399 Views
  • 0 replies
  • 1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

  • 399 Views
  • 0 replies
  • 1 kudos
mokshaessential
by New Contributor
  • 303 Views
  • 0 replies
  • 0 kudos

mokshaessentials.com

Mokshaessentials is one of the best essential oils providers that provides 100 % pure & natural essential oils.Also, buy essential oils only @Mokshaessentials.Web:- https://mokshaessentials.com/#naturalessentialoils,#buyessentialoils, #bestessentialo...

  • 303 Views
  • 0 replies
  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors