cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anku_
by New Contributor II
  • 1011 Views
  • 2 replies
  • 0 kudos

New to PySpark

Hi all,I am trying to get the domain from an email field using below expression; but getting an error.Kindly help. df.select(df.email, substring(df.email,instr(df.email,'@'),length(df.email).alias('domain')))

  • 1011 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Honored Contributor
  • 0 kudos

In your case, you want to extract the domain from the email, which starts from the position just after '@'. So, you should add 1 to the position of '@'. Also, the length of the substring should be the difference between the total length of the email ...

  • 0 kudos
1 More Replies
kickbuttowski
by New Contributor II
  • 672 Views
  • 1 replies
  • 0 kudos

Issue in inferring schema for streaming dataframe using json files

Below is the pileine design in databricks and it's not working out , kindly look on this and let me know whether it will work or not , I'm getting json files of different schemas from directory under the root directory and it read all the files using...

  • 672 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Could you please share some sample of your dataset and code snippet of what you're trying to implement?

  • 0 kudos
pernilak
by New Contributor III
  • 1488 Views
  • 2 replies
  • 3 kudos

Resolved! Pros and cons of physically separating data in different storage accounts and containers

When setting up Unity Catalog, it is recommended by Databricks to figure out your data isolation model when it comes to physically separating your data into different storage accounts and/or contaners. There are so many options, it can be hard to be ...

  • 1488 Views
  • 2 replies
  • 3 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 3 kudos

Hello @pernilak , Thanks for reaching out to Databricks Community! My name is Raphael, and I'll be helping out. Should all catalogs and the metastore reside in the same storage account (but different containers)   Yes, Databricks recommends having o...

  • 3 kudos
1 More Replies
Mesh
by New Contributor II
  • 3119 Views
  • 2 replies
  • 0 kudos

Optimizing for Recall in Azure AutoML UI

Hi all, I've been using Azure AutoML and noticed that I can choose 'recall' as my optimization metric in the notebook but not in the Azure AutoML UI. The Databricks documentation also doesn't list 'recall' as an optimization metric.Is there a reason ...

  • 3119 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mesh
New Contributor II
  • 0 kudos

On the databricks notebook itself, I can see that databricks.automl supports using recall as a primary metric Help on function classify in module databricks.automl: :param primary_metric: primary metric to select the best model. Each trial will...

  • 0 kudos
1 More Replies
HakanNordgren
by New Contributor II
  • 616 Views
  • 1 replies
  • 0 kudos

Both `hive_metastore` and `spark_catalog`?

HiIs it possible for a workspace to have both a `hive_metastore` catalog and a `spark_catalog` catalog?

  • 616 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @HakanNordgren,  Hive Metastore Catalog (hive_metastore): The Hive Metastore is a legacy catalog that manages metadata for tables, databases, and partitions.It contains information about schemas, tables, and their associated properties.If your...

  • 0 kudos
swapnilmd
by New Contributor
  • 614 Views
  • 1 replies
  • 0 kudos

Databricks Web Editor's Cell like UI in local IDE

I want to have databricks related developement locally.There is extension that allows to run local python file on remote databricks cluster.But I want to have cell like structure that is present in databricks UI for python files in local IDE as well....

  • 614 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@swapnilmd You can use VSCode extension for Databricks.https://docs.databricks.com/en/dev-tools/vscode-ext/index.html

  • 0 kudos
NhanNguyen
by Contributor II
  • 891 Views
  • 3 replies
  • 1 kudos

[Memory utilization in Metrics Tab still display after terminate a cluster]

Hi All,Could you guys help me to check this?I run a cluster and then terminate that cluster but when i navigate to the Metrics tab of Cluster still see the Memory utilization show metrics.Thanks

jensen22_0-1710993062168.png
  • 891 Views
  • 3 replies
  • 1 kudos
Latest Reply
NhanNguyen
Contributor II
  • 1 kudos

here are my cluster display and my simple notebook:

  • 1 kudos
2 More Replies
HakanNordgren
by New Contributor II
  • 1065 Views
  • 4 replies
  • 0 kudos

databricks-jdbc lists `spark_catalog` among catalogs for Standard tier Azure workspace

databricks-jdbc lists `spark_catalog` among catalogs for Standard tier Azure workspace. The UI lists `hive_metastore`. It would be better if these two were consistent.

  • 1065 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @HakanNordgren, Let’s address the inconsistency between spark_catalog and hive_metastore in Databricks. Here’s what we know: spark_catalog: This catalog is associated with Databricks and is used for managing metadata related to tables, views, ...

  • 0 kudos
3 More Replies
Chris_Konsur
by New Contributor III
  • 1193 Views
  • 3 replies
  • 1 kudos

an autoloader in file notification mode to get files from S3 on AWS -Error

I configured an autoloader in file notification mode to get files from S3 on AWS.spark.readStream\.format("cloudFiles")\.option("cloudFiles.format", "json")\.option("cloudFiles.inferColumnTypes", "true")\.option("cloudFiles.schemaLocation", "dbfs:/au...

  • 1193 Views
  • 3 replies
  • 1 kudos
Latest Reply
Selz
New Contributor II
  • 1 kudos

In case anyone else stumbles across this, I was able to fix my issue by setting up an instance profile with the file notification permissions and attaching the instance profile to the job cluster. It wasn't clear from the documentation that the file ...

  • 1 kudos
2 More Replies
Shaghil
by New Contributor II
  • 867 Views
  • 1 replies
  • 0 kudos

DataBricks Certification Exam Got Suspended. Require support for the same.

I encountered numerous challenges during my exam, starting with issues related to system compatibility and difficulties with my microphone and other settings. Despite attempting to contact support multiple times, it was not easy to get assistance.Aft...

  • 867 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Shaghil , Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).

  • 0 kudos
AbhilashMV
by New Contributor II
  • 797 Views
  • 1 replies
  • 0 kudos

Not able to download Certificate

Hi All,I took the course: Get Started With Data Engineering  from below course link https://www.databricks.com/learn/training/getting-started-with-data-engineering#data-videoBut, after completing the Quiz, I am not able to download Certificate. The a...

  • 797 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @AbhilashMV, Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours...

  • 0 kudos
Godhuli
by New Contributor II
  • 1100 Views
  • 1 replies
  • 0 kudos

Unable to login to Databricks Community edition

I signed up to Databricks Community Edition with gmail and verified my account and used it to create a notebook as well but having issue in re-logging in even though email and password provided are correct. I did  try  "Forget Password" step but I am...

  • 1100 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Godhuli., Please look at this link related to the Community - Edition, which might solve your problem.   I appreciate your interest in sharing your Community-Edition query with us.   If you have any more questions or concerns, please don't hesita...

  • 0 kudos
craigr
by New Contributor II
  • 1048 Views
  • 1 replies
  • 1 kudos

Resolved! How are Struct type columns stored/accessed (interested in efficiency)?

Hello, I've searched around for awhile and didn't find a similar question here or elsewhere, so thought I'd ask...I'm assessing the storage/access efficiency of Struct type columns in delta tables.  I want to know more about how Databricks is storing...

  • 1048 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @craigr, Let’s delve into the storage and access efficiency of Struct type columns in Delta tables within the context of Databricks. Structured Data Sources and Efficiency: Structured data sources, such as Parquet and ORC, define a schema on t...

  • 1 kudos
dbsuersu
by New Contributor II
  • 952 Views
  • 2 replies
  • 0 kudos

Resolved! ArcGIS Connection

Hi,I am trying to connect to an ArcGIS instance using Data bricks. Is this possible? After connecting, I am trying to read the data into a Data fame.Please help me with this request. If its not possible to connect , please provide an alternative.Than...

  • 952 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @dbsuersu , Connecting to an ArcGIS instance using Databricks is indeed possible, and I’ll guide you through the process. ArcGIS GeoAnalytics Engine in Databricks: ArcGIS GeoAnalytics Engine (GA Engine) is a powerful plugin for Apache Spark™ t...

  • 0 kudos
1 More Replies
pernilak
by New Contributor III
  • 768 Views
  • 1 replies
  • 0 kudos

Best practices for working with external locations where many files arrive constantly

I have an Azure Function that receives files (not volumes) and dumps them to cloud storage. One-five files are received approx. per second. I want to create a partitioned table in Databricks to work with. How should I do this? E.g.: register the cont...

  • 768 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @pernilak,  Since you’re dealing with a high volume of files arriving approximately every second, creating a partitioned table is a good idea. Partitioning helps optimize query performance and manage large datasets efficiently. Here’s how you ca...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!