cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Khalil
by Contributor
  • 9347 Views
  • 5 replies
  • 7 kudos

Incremental ingestion of Snowflake data with Delta Live Table (CDC)

Hello,I have some data which are lying into Snowflake, so I want to apply CDC on them using delta live table but I am having some issues.Here is what I am trying to do:  @dlt.view() def table1(): return spark.read.format("snowflake").options(**opt...

  • 9347 Views
  • 5 replies
  • 7 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 7 kudos

The CDC for delta live works fine for delta tables, as you have noticed.  However it is not a full blown CDC implementation/software.If you want to capture changes in Snowflake, you will have to implement some CDC method on Snowflake itself, and read...

  • 7 kudos
4 More Replies
Anku_
by New Contributor II
  • 2741 Views
  • 2 replies
  • 0 kudos

New to PySpark

Hi all,I am trying to get the domain from an email field using below expression; but getting an error.Kindly help. df.select(df.email, substring(df.email,instr(df.email,'@'),length(df.email).alias('domain')))

  • 2741 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

In your case, you want to extract the domain from the email, which starts from the position just after '@'. So, you should add 1 to the position of '@'. Also, the length of the substring should be the difference between the total length of the email ...

  • 0 kudos
1 More Replies
kickbuttowski
by New Contributor II
  • 1863 Views
  • 1 replies
  • 0 kudos

Issue in inferring schema for streaming dataframe using json files

Below is the pileine design in databricks and it's not working out , kindly look on this and let me know whether it will work or not , I'm getting json files of different schemas from directory under the root directory and it read all the files using...

  • 1863 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Could you please share some sample of your dataset and code snippet of what you're trying to implement?

  • 0 kudos
NoviKamayana
by New Contributor
  • 5858 Views
  • 0 replies
  • 0 kudos

Database: Delta Lake or PostgreSQL

Hey all,I am searching for a non-political answer to my database questions. Please know that I am a data newbie and litteraly do not know anything about this topic, but I want to learn, so please be gentle.  Some context: I am working for an OEM that...

  • 5858 Views
  • 0 replies
  • 0 kudos
pernilak
by New Contributor III
  • 5982 Views
  • 2 replies
  • 3 kudos

Resolved! Pros and cons of physically separating data in different storage accounts and containers

When setting up Unity Catalog, it is recommended by Databricks to figure out your data isolation model when it comes to physically separating your data into different storage accounts and/or contaners. There are so many options, it can be hard to be ...

  • 5982 Views
  • 2 replies
  • 3 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 3 kudos

Hello @pernilak , Thanks for reaching out to Databricks Community! My name is Raphael, and I'll be helping out. Should all catalogs and the metastore reside in the same storage account (but different containers)   Yes, Databricks recommends having o...

  • 3 kudos
1 More Replies
swapnilmd
by New Contributor II
  • 1600 Views
  • 1 replies
  • 1 kudos

Databricks Web Editor's Cell like UI in local IDE

I want to have databricks related developement locally.There is extension that allows to run local python file on remote databricks cluster.But I want to have cell like structure that is present in databricks UI for python files in local IDE as well....

  • 1600 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Databricks MVP
  • 1 kudos

@swapnilmd You can use VSCode extension for Databricks.https://docs.databricks.com/en/dev-tools/vscode-ext/index.html

  • 1 kudos
Bhavishya
by New Contributor II
  • 5677 Views
  • 2 replies
  • 0 kudos

Databricks jdbc driver connectiion issue with apache solr

Hi,databricks jdbc version - 2.6.34I am facing the below issue with connecting databricks sql from apache solr Caused by: java.sql.SQLFeatureNotSupportedException: [Databricks][JDBC](10220) Driver does not support this optional feature.at com.databri...

  • 5677 Views
  • 2 replies
  • 0 kudos
Latest Reply
Bhavishya
New Contributor II
  • 0 kudos

Databricks team recommended to set IgnoreTransactions=1 and autocommit=false in the connection string but that didn't resolve the issue .Ultimately I had to use solr update API for uploading documents

  • 0 kudos
1 More Replies
NhanNguyen
by Contributor III
  • 2364 Views
  • 3 replies
  • 1 kudos

[Memory utilization in Metrics Tab still display after terminate a cluster]

Hi All,Could you guys help me to check this?I run a cluster and then terminate that cluster but when i navigate to the Metrics tab of Cluster still see the Memory utilization show metrics.Thanks

jensen22_0-1710993062168.png
  • 2364 Views
  • 3 replies
  • 1 kudos
Latest Reply
NhanNguyen
Contributor III
  • 1 kudos

here are my cluster display and my simple notebook:

  • 1 kudos
2 More Replies
Anku_
by New Contributor II
  • 3284 Views
  • 0 replies
  • 0 kudos

New to Spark

Hi all,I am new to Spark, trying to write below code but getting an error.Code:df1 = df.filter(df.col1 > 60 and df.col2 != 'abc') Any suggestion? 

  • 3284 Views
  • 0 replies
  • 0 kudos
Stogpon
by New Contributor III
  • 9555 Views
  • 3 replies
  • 4 kudos

Resolved! Error not a delta table for Unity Catalog table

Is anyone able to advise why I am getting the error not a delta table?  The table was created in Unity Catalog.  I've also tried DeltaTable.forName and also using 13.3 LTS and 14.3 LTS clusters. Any advice would be much appreciated 

Screenshot 2024-03-18 at 12.10.30 PM.png Screenshot 2024-03-18 at 12.14.24 PM.png
  • 9555 Views
  • 3 replies
  • 4 kudos
Latest Reply
addy
New Contributor III
  • 4 kudos

@StogponI believe if you are using DeltaTable.forPath then you have to pass the path where the table is. You can get this path from the Catalog. It is available in the details tab of the table.Example:delta_table_path = "dbfs:/user/hive/warehouse/xyz...

  • 4 kudos
2 More Replies
pernilak
by New Contributor III
  • 2044 Views
  • 0 replies
  • 0 kudos

Best practices for working with external locations where many files arrive constantly

I have an Azure Function that receives files (not volumes) and dumps them to cloud storage. One-five files are received approx. per second. I want to create a partitioned table in Databricks to work with. How should I do this? E.g.: register the cont...

  • 2044 Views
  • 0 replies
  • 0 kudos
sanjay
by Valued Contributor II
  • 8979 Views
  • 9 replies
  • 0 kudos

Performance issue while calling mlflow endpoint

Hi,I have pyspark dataframe and pyspark udf which calls mlflow model for each row but its performance is too slow.Here is sample codedef myfunc(input_text):   restult = mlflowmodel.predict(input_text)   return resultmyfuncUDF = udf(myfunc,StringType(...

  • 8979 Views
  • 9 replies
  • 0 kudos
Latest Reply
Isabeente
New Contributor II
  • 0 kudos

So good

  • 0 kudos
8 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels