cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

alphaRomeo
by New Contributor
  • 3743 Views
  • 2 replies
  • 0 kudos

Resolved! DataBricks with MySQL data source?

I have an existing data pipeline which looks like this: A small MySQL data source (around 250 GB) and data passes through Debezium/ Kafka / a custom data redactor -> to Glue ETL jobs and finally lands on Redshift, but the scale of the data is too sm...

  • 3743 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 0 kudos

There is a lot in this question, so generally speaking I suggest you reach out to the sales team at Databricks. You can talk to a solutions architect who get into more detail. Here are my general thoughts having seen a lot of customer arch:Generally,...

  • 0 kudos
1 More Replies
EvandroLippert_
by New Contributor
  • 1716 Views
  • 1 replies
  • 0 kudos

Conflict with bitbucket and github credentials

I'm migrating my files from Bitbucket to Github, but every time that I need to clone something from bitbucket and send it to GitHub, I need to create a new token to integrate the tools. It seems that when you save a Github credential, it overrides t...

  • 1716 Views
  • 1 replies
  • 0 kudos
Latest Reply
alexott
Databricks Employee
  • 0 kudos

Cross-posting my answer from StackOverflow:Unfortunately right now it works only with a single Git provider. It looks like that you're linking individual notebooks into Git repository. You can simplify things by cloning the Bitbucket repository(-ies)...

  • 0 kudos
Alex_G
by New Contributor II
  • 2107 Views
  • 1 replies
  • 4 kudos

Resolved! Databricks Feature Store in MLFlow run CLI command

Hello!I am attempting to move some machine learning code from a databricks notebook into a mlflow git repository. I am utilizing the databricks feature store to load features that have been processed. Currently I cannot get the databricks library to ...

  • 2107 Views
  • 1 replies
  • 4 kudos
Latest Reply
sean_owen
Databricks Employee
  • 4 kudos

Hm, what error do you get? I believe you won't be able to specify the feature store library as a dependency, as it's not externally published yet, but code that uses it should run on DB ML runtimes as it already exists there

  • 4 kudos
irfanaziz
by Contributor II
  • 1767 Views
  • 2 replies
  • 3 kudos

Does anyone know why the optimize does not complete?

I feel there is some issue with a few partitions of the delta file. The optimize runs fine and completes within few minutes for other partitions but for this particular partition the optimize keeps running forever. OPTIMIZE delta.`/mnt/prod-abc/Ini...

  • 1767 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@nafri A​ - Thank you for letting us know.

  • 3 kudos
1 More Replies
User16137833804
by Databricks Employee
  • 3488 Views
  • 3 replies
  • 1 kudos
  • 3488 Views
  • 3 replies
  • 1 kudos
Latest Reply
Sebastian
Contributor
  • 1 kudos

the best solution is to store the .whl locally and do a pip install of the local whl while server boots up. this will freeze the library version. if you install from the pip it might impact your production work.

  • 1 kudos
2 More Replies
User16856693631
by New Contributor II
  • 1675 Views
  • 1 replies
  • 0 kudos

Can you create Clusters via a REST API?

Yes, you can. See here: https://docs.databricks.com/dev-tools/api/latest/clusters.htmlThe JSON payload would look as follows:{ "cluster_name": "my-cluster", "spark_version": "7.3.x-scala2.12", "node_type_id": "i3.xlarge", "spark_conf": { ...

  • 1675 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManishPatil
New Contributor II
  • 0 kudos

One can create a Cluster(s) using CLuster API @ https://docs.databricks.com/dev-tools/api/latest/clusters.html#create However, REST API 2.0 doesn't provide certain features like "Enable Table Access Control", which has been introduced after REST API ...

  • 0 kudos
BorislavBlagoev
by Valued Contributor III
  • 3477 Views
  • 2 replies
  • 1 kudos

DBUtils cannot find widgets [Windows 10]

I use databricks connect to connect PyCharm with databricks cluster remotely but when I try to get dbutils.widget throw an error. cluster conf: spark.databricks.service.server.enabled true spark.databricks.hive.metastore.glueCatalog.enabled true ...

  • 3477 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

This is normal behavior. databricks-connect does not support the whole dbutils class.https://docs.databricks.com/dev-tools/databricks-connect.html#access-dbutilsWidgets are not on the list.

  • 1 kudos
1 More Replies
bciampa
by New Contributor II
  • 20135 Views
  • 1 replies
  • 1 kudos

Unable to infer schema for Parquet at

I have this code in a notebook:val streamingDataFrame = incomingStream.selectExpr("cast (body as string) AS Content") .withColumn("Sentiment", toSentiment($"Content")) import org.apache.spark.sql.streaming.Trigger.ProcessingTime val result = stre...

  • 20135 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

seems like an invalid parquet file. my guess is the incoming data has mixed types (for the same column) or a different/invalid structure.

  • 1 kudos
cconnell
by Contributor II
  • 535 Views
  • 0 replies
  • 1 kudos

medium.com

I wrote a review of Koalas by porting an existing pandas program. Comments welcome.https://medium.com/@chuck.connell.3/pandas-on-databricks-via-koalas-a-review-9876b0a92541

  • 535 Views
  • 0 replies
  • 1 kudos
mokshaessential
by New Contributor
  • 389 Views
  • 0 replies
  • 0 kudos

mokshaessentials.com

Mokshaessentials is one of the best essential oils providers that provides 100 % pure & natural essential oils.Also, buy essential oils only @Mokshaessentials.Web:- https://mokshaessentials.com/#naturalessentialoils,#buyessentialoils, #bestessentialo...

  • 389 Views
  • 0 replies
  • 0 kudos
haseebkhan1421
by New Contributor
  • 2427 Views
  • 1 replies
  • 3 kudos

How can I create a column on the fly which would have same value for all rows in spark sql query

I have a SQL query which I am converting into spark sql in azure databricks running in my jupyter notebook. In my SQL query, a column named Type is created on the fly which has value 'Goal' for every row:SELECT Type='Goal', Value FROM tableNow, when...

  • 2427 Views
  • 1 replies
  • 3 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 3 kudos

The correct syntax would be: SELECT 'Goal' AS Type, Value FROM table

  • 3 kudos
maheshwor
by New Contributor III
  • 1201 Views
  • 1 replies
  • 2 kudos

Resolved! Databricks Views

How do we find the definition of View in databricks?

  • 1201 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

You can use the extended table description. For example, the following python code will print the current definition of the view: table_name = "" df = spark.sql("describe table extended {}".format(table_name)) df.createOrReplaceTempView("view_desript...

  • 2 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels