cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anku_
by New Contributor II
  • 2253 Views
  • 2 replies
  • 0 kudos

New to PySpark

Hi all,I am trying to get the domain from an email field using below expression; but getting an error.Kindly help. df.select(df.email, substring(df.email,instr(df.email,'@'),length(df.email).alias('domain')))

  • 2253 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

In your case, you want to extract the domain from the email, which starts from the position just after '@'. So, you should add 1 to the position of '@'. Also, the length of the substring should be the difference between the total length of the email ...

  • 0 kudos
1 More Replies
kickbuttowski
by New Contributor II
  • 1575 Views
  • 1 replies
  • 0 kudos

Issue in inferring schema for streaming dataframe using json files

Below is the pileine design in databricks and it's not working out , kindly look on this and let me know whether it will work or not , I'm getting json files of different schemas from directory under the root directory and it read all the files using...

  • 1575 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Could you please share some sample of your dataset and code snippet of what you're trying to implement?

  • 0 kudos
NoviKamayana
by New Contributor
  • 5231 Views
  • 0 replies
  • 0 kudos

Database: Delta Lake or PostgreSQL

Hey all,I am searching for a non-political answer to my database questions. Please know that I am a data newbie and litteraly do not know anything about this topic, but I want to learn, so please be gentle.  Some context: I am working for an OEM that...

  • 5231 Views
  • 0 replies
  • 0 kudos
pernilak
by New Contributor III
  • 4889 Views
  • 2 replies
  • 3 kudos

Resolved! Pros and cons of physically separating data in different storage accounts and containers

When setting up Unity Catalog, it is recommended by Databricks to figure out your data isolation model when it comes to physically separating your data into different storage accounts and/or contaners. There are so many options, it can be hard to be ...

  • 4889 Views
  • 2 replies
  • 3 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 3 kudos

Hello @pernilak , Thanks for reaching out to Databricks Community! My name is Raphael, and I'll be helping out. Should all catalogs and the metastore reside in the same storage account (but different containers)   Yes, Databricks recommends having o...

  • 3 kudos
1 More Replies
swapnilmd
by New Contributor II
  • 1366 Views
  • 1 replies
  • 1 kudos

Databricks Web Editor's Cell like UI in local IDE

I want to have databricks related developement locally.There is extension that allows to run local python file on remote databricks cluster.But I want to have cell like structure that is present in databricks UI for python files in local IDE as well....

  • 1366 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@swapnilmd You can use VSCode extension for Databricks.https://docs.databricks.com/en/dev-tools/vscode-ext/index.html

  • 1 kudos
Bhavishya
by New Contributor II
  • 4234 Views
  • 2 replies
  • 0 kudos

Databricks jdbc driver connectiion issue with apache solr

Hi,databricks jdbc version - 2.6.34I am facing the below issue with connecting databricks sql from apache solr Caused by: java.sql.SQLFeatureNotSupportedException: [Databricks][JDBC](10220) Driver does not support this optional feature.at com.databri...

  • 4234 Views
  • 2 replies
  • 0 kudos
Latest Reply
Bhavishya
New Contributor II
  • 0 kudos

Databricks team recommended to set IgnoreTransactions=1 and autocommit=false in the connection string but that didn't resolve the issue .Ultimately I had to use solr update API for uploading documents

  • 0 kudos
1 More Replies
NhanNguyen
by Contributor III
  • 2045 Views
  • 3 replies
  • 1 kudos

[Memory utilization in Metrics Tab still display after terminate a cluster]

Hi All,Could you guys help me to check this?I run a cluster and then terminate that cluster but when i navigate to the Metrics tab of Cluster still see the Memory utilization show metrics.Thanks

jensen22_0-1710993062168.png
  • 2045 Views
  • 3 replies
  • 1 kudos
Latest Reply
NhanNguyen
Contributor III
  • 1 kudos

here are my cluster display and my simple notebook:

  • 1 kudos
2 More Replies
Anku_
by New Contributor II
  • 3065 Views
  • 0 replies
  • 0 kudos

New to Spark

Hi all,I am new to Spark, trying to write below code but getting an error.Code:df1 = df.filter(df.col1 > 60 and df.col2 != 'abc') Any suggestion? 

  • 3065 Views
  • 0 replies
  • 0 kudos
Stogpon
by New Contributor III
  • 8227 Views
  • 3 replies
  • 4 kudos

Resolved! Error not a delta table for Unity Catalog table

Is anyone able to advise why I am getting the error not a delta table?  The table was created in Unity Catalog.  I've also tried DeltaTable.forName and also using 13.3 LTS and 14.3 LTS clusters. Any advice would be much appreciated 

Screenshot 2024-03-18 at 12.10.30 PM.png Screenshot 2024-03-18 at 12.14.24 PM.png
  • 8227 Views
  • 3 replies
  • 4 kudos
Latest Reply
addy
New Contributor III
  • 4 kudos

@StogponI believe if you are using DeltaTable.forPath then you have to pass the path where the table is. You can get this path from the Catalog. It is available in the details tab of the table.Example:delta_table_path = "dbfs:/user/hive/warehouse/xyz...

  • 4 kudos
2 More Replies
pernilak
by New Contributor III
  • 1841 Views
  • 0 replies
  • 0 kudos

Best practices for working with external locations where many files arrive constantly

I have an Azure Function that receives files (not volumes) and dumps them to cloud storage. One-five files are received approx. per second. I want to create a partitioned table in Databricks to work with. How should I do this? E.g.: register the cont...

  • 1841 Views
  • 0 replies
  • 0 kudos
sanjay
by Valued Contributor II
  • 7919 Views
  • 9 replies
  • 0 kudos

Performance issue while calling mlflow endpoint

Hi,I have pyspark dataframe and pyspark udf which calls mlflow model for each row but its performance is too slow.Here is sample codedef myfunc(input_text):   restult = mlflowmodel.predict(input_text)   return resultmyfuncUDF = udf(myfunc,StringType(...

  • 7919 Views
  • 9 replies
  • 0 kudos
Latest Reply
Isabeente
New Contributor II
  • 0 kudos

So good

  • 0 kudos
8 More Replies
Ramakrishnan83
by New Contributor III
  • 2764 Views
  • 1 replies
  • 0 kudos

Resolved! Understanding Spark Architecture during Table Creation

Team ,I am trying understand how the parquet files and JSON under the delta log folder stores the data behind the scenesTable Creation:from delta.tables import *DeltaTable.create(spark) \.tableName("employee") \.addColumn("id", "INT") \.addColumn("na...

Ramakrishnan83_0-1710772217666.png Ramakrishnan83_1-1710772318911.png Ramakrishnan83_2-1710772374126.png
  • 2764 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@Ramakrishnan83  - Kindly go through the blog post - https://www.databricks.com/blog/2019/08/21/diving-into-delta-lake-unpacking-the-transaction-log.html which discuss in detail on delta's transaction log.

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels